### Preprints — Published articles

*An adaptive transport framework for joint and conditional density estimation,*Preprint, (2020).

##### An adaptive transport framework for joint and conditional density estimation

We propose a general framework to robustly characterize joint and conditional probability distributions via transport maps. Transport maps or "flows" deterministically couple two distributions via an expressive monotone transformation. Yet, learning the parameters of such transformations in high dimensions is challenging given few samples from the unknown target distribution, and structural choices for these transformations can have a significant impact on performance. Here we formulate a systematic framework for representing and learning monotone maps, via invertible transformations of smooth functions, and demonstrate that the associated minimization problem has a unique global optimum. Given a hierarchical basis for the appropriate function space, we propose a sample-efficient adaptive algorithm that estimates a sparse approximation for the map. We demonstrate how this framework can learn densities with stable generalization performance across a wide range of sample sizes on real-world datasets.

##### BibTeX entry

@article{baptista-atm-2020, author = "R. Baptista and O. Zahm and Y. M. Marzouk", journal = "Preprint", title = "An adaptive transport framework for joint and conditional density estimation", year = "2020", abstract = "We propose a general framework to robustly characterize joint and conditional probability distributions via transport maps. Transport maps or "flows" deterministically couple two distributions via an expressive monotone transformation. Yet, learning the parameters of such transformations in high dimensions is challenging given few samples from the unknown target distribution, and structural choices for these transformations can have a significant impact on performance. Here we formulate a systematic framework for representing and learning monotone maps, via invertible transformations of smooth functions, and demonstrate that the associated minimization problem has a unique global optimum. Given a hierarchical basis for the appropriate function space, we propose a sample-efficient adaptive algorithm that estimates a sparse approximation for the map. We demonstrate how this framework can learn densities with stable generalization performance across a wide range of sample sizes on real-world datasets." }

*Batch greedy maximization of non-submodular functions: guarantees and applications to experimental design,*Preprint, (2020).

##### Batch greedy maximization of non-submodular functions: guarantees and applications to experimental design

We propose and analyze batch greedy heuristics for cardinality constrained maximization of non-submodular non-decreasing set functions. Our theoretical guarantees are characterized by the combination of submodularity and supermodularity ratios. We argue how these parameters define tight modular bounds based on incremental gains, and provide a novel reinterpretation of the classical greedy algorithm using the minorize-maximize (MM) principle. Based on that analogy, we propose a new class of methods exploiting any plausible modular bound. In the context of optimal experimental design for linear Bayesian inverse problems, we bound the submodularity and supermodularity ratios when the underlying objective is based on mutual information. We also develop novel modular bounds for the mutual information in this setting, and describe certain connections to polyhedral combinatorics. We discuss how algorithms using these modular bounds relate to established statistical notions such as leverage scores and to more recent efforts such as volume sampling. We demonstrate our theoretical findings on synthetic problems and on a real-world climate monitoring example.

##### BibTeX entry

@article{jagalur-batchgreedy-2020, author = "J. Jagalur-Mohan and Y. M. Marzouk", journal = "Preprint", title = "Batch greedy maximization of non-submodular functions: guarantees and applications to experimental design", year = "2020", abstract = "We propose and analyze batch greedy heuristics for cardinality constrained maximization of non-submodular non-decreasing set functions. Our theoretical guarantees are characterized by the combination of submodularity and supermodularity ratios. We argue how these parameters define tight modular bounds based on incremental gains, and provide a novel reinterpretation of the classical greedy algorithm using the minorize-maximize (MM) principle. Based on that analogy, we propose a new class of methods exploiting any plausible modular bound. In the context of optimal experimental design for linear Bayesian inverse problems, we bound the submodularity and supermodularity ratios when the underlying objective is based on mutual information. We also develop novel modular bounds for the mutual information in this setting, and describe certain connections to polyhedral combinatorics. We discuss how algorithms using these modular bounds relate to established statistical notions such as leverage scores and to more recent efforts such as volume sampling. We demonstrate our theoretical findings on synthetic problems and on a real-world climate monitoring example.", keywords = "greedy methods, submodularity, non-submodular functions, optimal experi- mental design, inverse problems, mutual information, uncertainty quantification, Bayesian statistics" }

*Sparse approximation of triangular transports on bounded domains,*Preprint, (2020).

##### Sparse approximation of triangular transports on bounded domains

Let $\rho$ and $\pi$ be two probability measures on

$[-1,1]^d$ with positive and analytic Lebesgue densities. We

investigate the approximation of the unique triangular monotone

(Knothe–Rosenblatt) transport $T:[-1,1]^d\to [-1,1]^d$, such that

the pushforward $T_\sharp\rho$ equals $\pi$. It is shown that

for $d\in\N$ there exist approximations $\tilde T$ of $T$ based on

either sparse polynomial expansions or ReLU networks, such that the

distance between $\tilde T_\sharp\rho$ and $\pi$ decreases

exponentially. More precisely, we show error bounds of the type

$\exp(-\beta N^{1/d})$ (or $\exp(-\beta N^{1/(d+1)})$ for neural

networks), where $N$ refers to the dimension of the ansatz space (or

the size of the network) containing $\tilde T$; the notion of

distance comprises, among others, the Hellinger distance and the

Kullback–Leibler divergence. The construction guarantees $\tilde T$

to be a monotone triangular bijective transport on the hypercube

$[-1,1]^d$. Analogous results hold for the inverse transport

$S=T^{-1}$. The proofs are constructive, and we give an explicit

\textit{a priori} description of the ansatz space, which can be used

for numerical implementations. Additionally we discuss the

high-dimensional case: for $d=\infty$ a dimension-independent

algebraic convergence rate is proved for a class of probability

measures occurring widely in Bayesian inference

for uncertainty quantification, thus verifying that the curse of

dimensionality can be overcome in this setting.

$[-1,1]^d$ with positive and analytic Lebesgue densities. We

investigate the approximation of the unique triangular monotone

(Knothe–Rosenblatt) transport $T:[-1,1]^d\to [-1,1]^d$, such that

the pushforward $T_\sharp\rho$ equals $\pi$. It is shown that

for $d\in\N$ there exist approximations $\tilde T$ of $T$ based on

either sparse polynomial expansions or ReLU networks, such that the

distance between $\tilde T_\sharp\rho$ and $\pi$ decreases

exponentially. More precisely, we show error bounds of the type

$\exp(-\beta N^{1/d})$ (or $\exp(-\beta N^{1/(d+1)})$ for neural

networks), where $N$ refers to the dimension of the ansatz space (or

the size of the network) containing $\tilde T$; the notion of

distance comprises, among others, the Hellinger distance and the

Kullback–Leibler divergence. The construction guarantees $\tilde T$

to be a monotone triangular bijective transport on the hypercube

$[-1,1]^d$. Analogous results hold for the inverse transport

$S=T^{-1}$. The proofs are constructive, and we give an explicit

\textit{a priori} description of the ansatz space, which can be used

for numerical implementations. Additionally we discuss the

high-dimensional case: for $d=\infty$ a dimension-independent

algebraic convergence rate is proved for a class of probability

measures occurring widely in Bayesian inference

for uncertainty quantification, thus verifying that the curse of

dimensionality can be overcome in this setting.

##### BibTeX entry

@article{zech-triangular-2020, author = "J. Zech and Y. M. Marzouk", journal = "Preprint", title = "Sparse approximation of triangular transports on bounded domains", year = "2020", abstract = "Let $\rho$ and $\pi$ be two probability measures on $[-1,1]^d$ with positive and analytic Lebesgue densities. We investigate the approximation of the unique triangular monotone (Knothe--Rosenblatt) transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharp\rho$ equals $\pi$. It is shown that for $d\in\N$ there exist approximations $\tilde T$ of $T$ based on either sparse polynomial expansions or ReLU networks, such that the distance between $\tilde T_\sharp\rho$ and $\pi$ decreases exponentially. More precisely, we show error bounds of the type $\exp(-\beta N^{1/d})$ (or $\exp(-\beta N^{1/(d+1)})$ for neural networks), where $N$ refers to the dimension of the ansatz space (or the size of the network) containing $\tilde T$; the notion of distance comprises, among others, the Hellinger distance and the Kullback--Leibler divergence. The construction guarantees $\tilde T$ to be a monotone triangular bijective transport on the hypercube $[-1,1]^d$. Analogous results hold for the inverse transport $S=T^{-1}$. The proofs are constructive, and we give an explicit \textit{a priori} description of the ansatz space, which can be used for numerical implementations. Additionally we discuss the high-dimensional case: for $d=\infty$ a dimension-independent algebraic convergence rate is proved for a class of probability measures occurring widely in Bayesian inference for uncertainty quantification, thus verifying that the curse of dimensionality can be overcome in this setting.", keywords = "transport maps, domains of holomorphy, uncertainty quantification, sparse approximation, neural networks, sampling" }

*Rate-optimal refinement strategies for local approximation MCMC,*Preprint, (2020).

##### Rate-optimal refinement strategies for local approximation MCMC

Many Bayesian inference problems involve target distributions whose density functions are computationally expensive to evaluate. Replacing the target density with a local approximation based on a small number of carefully chosen density evaluations can significantly reduce the computational expense of Markov chain Monte Carlo (MCMC) sampling. Moreover, continual refinement of the local approximation can guarantee asymptotically exact sampling. We devise a new strategy for balancing the decay rate of the bias due to the approximation with that of the MCMC variance. We prove that the error of the resulting local approximation MCMC (LA-MCMC) algorithm decays at roughly the expected $1/\sqrt{T}$ rate, and we demonstrate this rate numerically. We also introduce an algorithmic parameter that guarantees convergence given very weak tail bounds, significantly strengthening previous convergence results. Finally, we apply LA-MCMC to a computationally intensive Bayesian inverse problem arising in groundwater hydrology.

##### BibTeX entry

@article{davis-lamcmc-2020, author = "A. Davis and Y. M. Marzouk and N. Pillai and A. Smith", journal = "Preprint", title = "Rate-optimal refinement strategies for local approximation MCMC", year = "2020", abstract = "Many Bayesian inference problems involve target distributions whose density functions are computationally expensive to evaluate. Replacing the target density with a local approximation based on a small number of carefully chosen density evaluations can significantly reduce the computational expense of Markov chain Monte Carlo (MCMC) sampling. Moreover, continual refinement of the local approximation can guarantee asymptotically exact sampling. We devise a new strategy for balancing the decay rate of the bias due to the approximation with that of the MCMC variance. We prove that the error of the resulting local approximation MCMC (LA-MCMC) algorithm decays at roughly the expected $1/\sqrt{T}$ rate, and we demonstrate this rate numerically. We also introduce an algorithmic parameter that guarantees convergence given very weak tail bounds, significantly strengthening previous convergence results. Finally, we apply LA-MCMC to a computationally intensive Bayesian inverse problem arising in groundwater hydrology.", keywords = "Markov chain Monte Carlo, local regression, Bayesian inference, surrogate models, sampling" }

*Conditional sampling with monotone GANs,*Preprint, (2020).

##### Conditional sampling with monotone GANs

We present a new approach for sampling conditional measures that enables uncertainty quantification in supervised learning tasks. We construct a mapping that transforms a reference measure to the probability measure of the output conditioned on new inputs. The mapping is trained via a modification of generative adversarial networks (GANs), called monotone GANs, that imposes monotonicity constraints and a block triangular structure. We present theoretical results, in an idealized setting, that support our proposed method as well as numerical experiments demonstrating the ability of our method to sample the correct conditional measures in applications ranging from inverse problems to image in-painting.

##### BibTeX entry

@article{kovachki-monotonegans-2020, author = "N. Kovachki and R. Baptista and B. Hosseini and Y. M. Marzouk", journal = "Preprint", title = "Conditional sampling with monotone GANs", year = "2020", abstract = "We present a new approach for sampling conditional measures that enables uncertainty quantification in supervised learning tasks. We construct a mapping that transforms a reference measure to the probability measure of the output conditioned on new inputs. The mapping is trained via a modification of generative adversarial networks (GANs), called monotone GANs, that imposes monotonicity constraints and a block triangular structure. We present theoretical results, in an idealized setting, that support our proposed method as well as numerical experiments demonstrating the ability of our method to sample the correct conditional measures in applications ranging from inverse problems to image in-painting." }

*Cross-entropy based importance sampling with failure-informed dimension reduction for rare event simulation,*Preprint, (2020).

##### Cross-entropy based importance sampling with failure-informed dimension reduction for rare event simulation

The estimation of rare event or failure probabilities in high dimensions is of interest in many areas of science and technology. We consider problems where the rare event is expressed in terms of a computationally costly numerical model. Importance sampling with the cross-entropy method offers an efficient way to address such problems provided that a suitable parametric family of biasing densities is employed. Although some existing parametric distribution families are designed to perform efficiently in high dimensions, their applicability within the cross-entropy method is limited to problems with dimension of O(10^2). In this work, rather than directly building sampling densities in high dimensions, we focus on identifying the intrinsic low-dimensional structure of the rare event simulation problem. To this end, we exploit a connection between rare event simulation and Bayesian inverse problems. This allows us to adapt dimension reduction techniques from Bayesian inference to construct new, effectively low-dimensional, biasing distributions within the cross-entropy method. In particular, we employ the approach in [Zahm et al. 2018], as it enables control of the error in the approximation of the optimal biasing distribution. We illustrate our method using two standard high-dimensional reliability benchmark problems and one structural mechanics application involving random fields.

##### BibTeX entry

@article{uribe-crossentropy-2020, author = "F. Uribe and I. Papaioannou and Y. M. Marzouk and D. Straub", journal = "Preprint", title = "Cross-entropy based importance sampling with failure-informed dimension reduction for rare event simulation", year = "2020", abstract = "The estimation of rare event or failure probabilities in high dimensions is of interest in many areas of science and technology. We consider problems where the rare event is expressed in terms of a computationally costly numerical model. Importance sampling with the cross-entropy method offers an efficient way to address such problems provided that a suitable parametric family of biasing densities is employed. Although some existing parametric distribution families are designed to perform efficiently in high dimensions, their applicability within the cross-entropy method is limited to problems with dimension of O(10^2). In this work, rather than directly building sampling densities in high dimensions, we focus on identifying the intrinsic low-dimensional structure of the rare event simulation problem. To this end, we exploit a connection between rare event simulation and Bayesian inverse problems. This allows us to adapt dimension reduction techniques from Bayesian inference to construct new, effectively low-dimensional, biasing distributions within the cross-entropy method. In particular, we employ the approach in [Zahm et al. 2018], as it enables control of the error in the approximation of the optimal biasing distribution. We illustrate our method using two standard high-dimensional reliability benchmark problems and one structural mechanics application involving random fields.", keywords = "rare event simulation, reliability analysis, likelihood-informed subspace, importance sampling, cross- entropy method, random fields" }

*Low-rank multi-parametric covariance estimation,*Preprint, (2020).

##### Low-rank multi-parametric covariance estimation

We propose a differential geometric construction for families of low-rank covariance matrices, via interpolation on low-rank matrix manifolds. In contrast with standard parametric covariance classes, these families offer significant flexibility for problem-specific tailoring via the choice of "anchor" matrices for the interpolation. Moreover, their low-rank facilitates computational tractability in high dimensions and with limited data. We employ these covariance families for both interpolation and identification, where the latter problem comprises selecting the most representative member of the covariance family given a data set. In this setting, standard procedures such as maximum likelihood estimation are nontrivial because the covariance family is rank-deficient; we resolve this issue by casting the identification problem as distance minimization. We demonstrate the power of these differential geometric families for interpolation and identification in a practical application: wind field covariance approximation for unmanned aerial vehicle navigation.

##### BibTeX entry

@article{musolas-lowrank-2020, author = "A. Musolas and E. Massart and J. Hendrickx and P.-A. Absil and Y. M. Marzouk", journal = "Preprint", title = "Low-rank multi-parametric covariance estimation", year = "2020", abstract = "We propose a differential geometric construction for families of low-rank covariance matrices, via interpolation on low-rank matrix manifolds. In contrast with standard parametric covariance classes, these families offer significant flexibility for problem-specific tailoring via the choice of "anchor" matrices for the interpolation. Moreover, their low-rank facilitates computational tractability in high dimensions and with limited data. We employ these covariance families for both interpolation and identification, where the latter problem comprises selecting the most representative member of the covariance family given a data set. In this setting, standard procedures such as maximum likelihood estimation are nontrivial because the covariance family is rank-deficient; we resolve this issue by casting the identification problem as distance minimization. We demonstrate the power of these differential geometric families for interpolation and identification in a practical application: wind field covariance approximation for unmanned aerial vehicle navigation." }

*Geodesically parameterized covariance estimation,*Preprint, (2020).

##### Geodesically parameterized covariance estimation

Statistical modeling of spatiotemporal phenomena often requires selecting a covariance matrix from a covariance class. Yet standard parametric covariance families can be insufficiently flexible for practical applications, while non-parametric approaches may not easily allow certain kinds of prior knowledge to be incorporated. We propose instead to build covariance families out of geodesic curves. These covariances offer more flexibility for problem-specific tailoring than classical parametric families, and are preferable to simple convex combinations. Once the covariance family has been chosen, one typically needs to select a representative member by solving an optimization problem, e.g., by maximizing the likelihood of a data set. We consider instead a differential geometric interpretation of this problem: minimizing the geodesic distance to a sample covariance matrix (‘‘natural projection’). Our approach is consistent with the notion of distance employed to build the covariance family and does not require assuming a particular probability distribution for the data. We show that natural projection and maximum likelihood are locally equivalent up to second order. We also demonstrate that natural projection may yield more accurate estimates with noise-corrupted data.

##### BibTeX entry

@article{musolas-fullrank-2020, author = "A. Musolas and S. T. Smith and Y. M. Marzouk", journal = "Preprint", title = "Geodesically parameterized covariance estimation", year = "2020", abstract = "Statistical modeling of spatiotemporal phenomena often requires selecting a covariance matrix from a covariance class. Yet standard parametric covariance families can be insufficiently flexible for practical applications, while non-parametric approaches may not easily allow certain kinds of prior knowledge to be incorporated. We propose instead to build covariance families out of geodesic curves. These covariances offer more flexibility for problem-specific tailoring than classical parametric families, and are preferable to simple convex combinations. Once the covariance family has been chosen, one typically needs to select a representative member by solving an optimization problem, e.g., by maximizing the likelihood of a data set. We consider instead a differential geometric interpretation of this problem: minimizing the geodesic distance to a sample covariance matrix (``natural projection'). Our approach is consistent with the notion of distance employed to build the covariance family and does not require assuming a particular probability distribution for the data. We show that natural projection and maximum likelihood are locally equivalent up to second order. We also demonstrate that natural projection may yield more accurate estimates with noise-corrupted data." }

*Coupling techniques for nonlinear ensemble filtering,*Preprint, (2019).

##### Coupling techniques for nonlinear ensemble filtering

We consider filtering in high-dimensional non-Gaussian state-space models with intractable transition kernels, nonlinear and possibly chaotic dynamics, and sparse observations in space and time. We propose a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions. Our approach can be understood as the natural generalization of the ensemble Kalman filter (EnKF) to nonlinear updates, using stochastic or deterministic couplings. The use of nonlinear updates can reduce the intrinsic bias of the EnKF at a marginal increase in computational cost. We avoid any form of importance sampling and introduce non-Gaussian localization approaches for dimension scalability. Our framework achieves state-of-the-art tracking performance on challenging configurations of the Lorenz-96 model in the chaotic regime.

##### BibTeX entry

@article{art_79, author = "A. Spantini and R. Baptista and Y. M. Marzouk", journal = "Preprint", title = "Coupling techniques for nonlinear ensemble filtering", year = "2019", abstract = "We consider filtering in high-dimensional non-Gaussian state-space models with intractable transition kernels, nonlinear and possibly chaotic dynamics, and sparse observations in space and time. We propose a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions. Our approach can be understood as the natural generalization of the ensemble Kalman filter (EnKF) to nonlinear updates, using stochastic or deterministic couplings. The use of nonlinear updates can reduce the intrinsic bias of the EnKF at a marginal increase in computational cost. We avoid any form of importance sampling and introduce non-Gaussian localization approaches for dimension scalability. Our framework achieves state-of-the-art tracking performance on challenging configurations of the Lorenz-96 model in the chaotic regime.", keywords = "nonlinear filtering, state-space models, couplings, transport maps, ensemble Kalman filter, graphical models, localization, approximate Bayesian computation" }

*A layered multiple importance sampling scheme for focused optimal Bayesian experimental design,*Preprint, (2019).

##### A layered multiple importance sampling scheme for focused optimal Bayesian experimental design

We develop a new computational approach for "focused" optimal Bayesian experimental design with nonlinear models, with the goal of maximizing expected information gain in targeted subsets of model parameters. Our approach considers uncertainty in the full set of model parameters, but employs a design objective that can exploit learning trade-offs among different parameter subsets. We introduce a new layered multiple importance sampling scheme that provides consistent estimates of expected information gain in this focused setting. This sampling scheme yields significant reductions in estimator bias and variance for a given computational effort, making optimal design more tractable for a wide range of computationally intensive problems.

##### BibTeX entry

@article{art_77, author = "C. Feng and Y. M. Marzouk", journal = "Preprint", title = "A layered multiple importance sampling scheme for focused optimal Bayesian experimental design", year = "2019", abstract = "We develop a new computational approach for "focused" optimal Bayesian experimental design with nonlinear models, with the goal of maximizing expected information gain in targeted subsets of model parameters. Our approach considers uncertainty in the full set of model parameters, but employs a design objective that can exploit learning trade-offs among different parameter subsets. We introduce a new layered multiple importance sampling scheme that provides consistent estimates of expected information gain in this focused setting. This sampling scheme yields significant reductions in estimator bias and variance for a given computational effort, making optimal design more tractable for a wide range of computationally intensive problems.", keywords = "optimal experimental design, Bayesian inference, Monte Carlo methods, multiple importance sampling, expected information gain, mutual information" }

*Certified dimension reduction in nonlinear Bayesian inverse problems,*Preprint, (2018).

##### Certified dimension reduction in nonlinear Bayesian inverse problems

We propose a dimension reduction technique for Bayesian inverse problems with nonlinear forward operators, non-Gaussian priors, and non-Gaussian observation noise. The likelihood function is approximated by a ridge function, i.e., a map which depends non-trivially only on a few linear combinations of the parameters. We build this ridge approximation by minimizing an upper bound on the Kullback–Leibler divergence between the posterior distribution and its approximation. This bound, obtained via logarithmic Sobolev inequalities, allows one to certify the error of the posterior approximation. Computing the bound requires computing the second moment matrix of the gradient of the log likelihood function. In practice, a sample-based approximation of the upper bound is then required. We provide an analysis that enables control of the posterior approximation error due to this sampling. Numerical and theoretical comparisons with existing methods illustrate the benefits of the proposed methodology.

##### BibTeX entry

@article{art_73, author = "O. Zahm and T. Cui and K. Law and A. Spantini and Y. M. Marzouk", journal = "Preprint", title = "Certified dimension reduction in nonlinear Bayesian inverse problems", year = "2018", abstract = "We propose a dimension reduction technique for Bayesian inverse problems with nonlinear forward operators, non-Gaussian priors, and non-Gaussian observation noise. The likelihood function is approximated by a ridge function, i.e., a map which depends non-trivially only on a few linear combinations of the parameters. We build this ridge approximation by minimizing an upper bound on the Kullback--Leibler divergence between the posterior distribution and its approximation. This bound, obtained via logarithmic Sobolev inequalities, allows one to certify the error of the posterior approximation. Computing the bound requires computing the second moment matrix of the gradient of the log likelihood function. In practice, a sample-based approximation of the upper bound is then required. We provide an analysis that enables control of the posterior approximation error due to this sampling. Numerical and theoretical comparisons with existing methods illustrate the benefits of the proposed methodology.", keywords = "dimension reduction, nonlinear Bayesian inverse problem, logarithmic Sobolev inequality, certified error bound, non-asymptotic analysis" }

*A trust region method for derivative-free nonlinear constrained stochastic optimization,*Preprint, (2017).

##### A trust region method for derivative-free nonlinear constrained stochastic optimization

We present the algorithm SNOWPAC for derivative-free constrained stochastic optimization. The algorithm builds on a model-based approach for deterministic nonlinear constrained derivative-free optimization that introduces an ‘‘inner boundary path’’ to locally convexify the feasible domain and ensure feasible trial steps. We extend this deterministic method via a generalized trust region approach that accounts for noisy evaluations of the objective and constraints. To reduce the impact of noise, we fit consistent Gaussian processes to past objective and constraint evaluations. Our approach incorporates a wide variety of probabilistic risk or deviation measures in both the objective and the constraints. Numerical benchmarking demonstrates SNOWPAC’s efficiency and highlights the accuracy of the optimization solutions found.

##### BibTeX entry

@article{art_55, author = "F. Augustin and Y. M. Marzouk", journal = "Preprint", title = "A trust region method for derivative-free nonlinear constrained stochastic optimization", year = "2017", abstract = "We present the algorithm SNOWPAC for derivative-free constrained stochastic optimization. The algorithm builds on a model-based approach for deterministic nonlinear constrained derivative-free optimization that introduces an ``inner boundary path'' to locally convexify the feasible domain and ensure feasible trial steps. We extend this deterministic method via a generalized trust region approach that accounts for noisy evaluations of the objective and constraints. To reduce the impact of noise, we fit consistent Gaussian processes to past objective and constraint evaluations. Our approach incorporates a wide variety of probabilistic risk or deviation measures in both the objective and the constraints. Numerical benchmarking demonstrates SNOWPAC's efficiency and highlights the accuracy of the optimization solutions found." }

*Sequential Bayesian optimal experimental design via approximate dynamic programming,*Preprint, (2016).

##### Sequential Bayesian optimal experimental design via approximate dynamic programming

The design of multiple experiments is commonly undertaken via suboptimal strategies, such as batch (open-loop) design that omits feedback or greedy (myopic) design that does not account for future effects. This paper introduces new strategies for the optimal design of sequential experiments. First, we rigorously formulate the general sequential optimal experimental design (sOED) problem as a dynamic program. Batch and greedy designs are shown to result from special cases of this formulation. We then focus on sOED for parameter inference, adopting a Bayesian formulation with an information theoretic design objective. To make the problem tractable, we develop new numerical approaches for nonlinear design with continuous parameter, design, and observation spaces. We approximate the optimal policy by using backward induction with regression to construct and refine value function approximations in the dynamic program. The proposed algorithm iteratively generates trajectories via exploration and exploitation to improve approximation accuracy in frequently visited regions of the state space. Numerical results are verified against analytical solutions in a linear-Gaussian setting. Advantages over batch and greedy design are then demonstrated on a nonlinear source inversion problem where we seek an optimal policy for sequential sensing.

##### BibTeX entry

@article{huan-sequential-2016, author = "X. Huan and Y. M. Marzouk", journal = "Preprint", title = "Sequential Bayesian optimal experimental design via approximate dynamic programming", year = "2016", abstract = "The design of multiple experiments is commonly undertaken via suboptimal strategies, such as batch (open-loop) design that omits feedback or greedy (myopic) design that does not account for future effects. This paper introduces new strategies for the optimal design of sequential experiments. First, we rigorously formulate the general sequential optimal experimental design (sOED) problem as a dynamic program. Batch and greedy designs are shown to result from special cases of this formulation. We then focus on sOED for parameter inference, adopting a Bayesian formulation with an information theoretic design objective. To make the problem tractable, we develop new numerical approaches for nonlinear design with continuous parameter, design, and observation spaces. We approximate the optimal policy by using backward induction with regression to construct and refine value function approximations in the dynamic program. The proposed algorithm iteratively generates trajectories via exploration and exploitation to improve approximation accuracy in frequently visited regions of the state space. Numerical results are verified against analytical solutions in a linear-Gaussian setting. Advantages over batch and greedy design are then demonstrated on a nonlinear source inversion problem where we seek an optimal policy for sequential sensing.", keywords = "sequential experimental design, Bayesian experimental design, approximate dynamic programming, feedback control policy, lookahead, approximate value iteration, information gain" }

*Bayesian level sets for image segmentation,*Preprint, (2015).

##### Bayesian level sets for image segmentation

This paper presents a new algorithm for image segmentation and classification, Bayesian Level Sets (BLS). BLS harnesses the advantages of two well-known algorithms: variational level sets and finite mixture model EM (FMM-EM). Like FMM-EM, BLS has a simple, probabilistic implementation which natively extends to an arbitrary number of classes. Via a level set-inspired geometric prior, BLS returns smooth, regular segmenting contours that are robust to noise. In practice, BLS is also observed to be robust to fairly lenient initial conditions. A comparative analysis of the three algorithms (BLS, level set, FMM-EM) is presented, and the advantages of BLS are quantitatively demonstrated on realistic applications such as pluripotent stem cell colonies, brain MRI phantoms, and stem cell nuclei.

##### BibTeX entry

@article{art_39, author = "N. Lowry and R. Mangoubi and M. Desai and Y. M. Marzouk and P. Sammak", journal = "Preprint", title = "Bayesian level sets for image segmentation", year = "2015", abstract = "This paper presents a new algorithm for image segmentation and classification, Bayesian Level Sets (BLS). BLS harnesses the advantages of two well-known algorithms: variational level sets and finite mixture model EM (FMM-EM). Like FMM-EM, BLS has a simple, probabilistic implementation which natively extends to an arbitrary number of classes. Via a level set-inspired geometric prior, BLS returns smooth, regular segmenting contours that are robust to noise. In practice, BLS is also observed to be robust to fairly lenient initial conditions. A comparative analysis of the three algorithms (BLS, level set, FMM-EM) is presented, and the advantages of BLS are quantitatively demonstrated on realistic applications such as pluripotent stem cell colonies, brain MRI phantoms, and stem cell nuclei." }

*NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints,*Preprint, (2014).

##### NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints

This paper proposes the algorithm NOWPAC (Nonlinear Optimization With Path-Augmented Constraints) for nonlinear constrained derivative-free optimization. The algorithm uses a trust region framework based on fully linear models for the objective function and the constraints. A new constraint-handling scheme based on an inner boundary path allows for the computation of feasible trial steps using models for the constraints. We prove that the iterates computed by NOWPAC converge to a local first order critical point. We also discuss the convergence of NOWPAC in situations where evaluations of the objective function or the constraints are inexact, e.g., corrupted by numerical errors. For this, we determine a rate of decay that the magnitude of these numerical errors must satisfy, while approaching the critical point, to guarantee convergence. In settings where adjusting the accuracy of the objective or constraint evaluations is not possible, as is often the case in practical applications, we introduce an error indicator to detect these regimes and prevent deterioration of the optimization results.

##### BibTeX entry

@article{augustin-nowpac-2014, author = "F. Augustin and Y. M. Marzouk", journal = "Preprint", title = "NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints", year = "2014", abstract = "This paper proposes the algorithm NOWPAC (Nonlinear Optimization With Path-Augmented Constraints) for nonlinear constrained derivative-free optimization. The algorithm uses a trust region framework based on fully linear models for the objective function and the constraints. A new constraint-handling scheme based on an inner boundary path allows for the computation of feasible trial steps using models for the constraints. We prove that the iterates computed by NOWPAC converge to a local first order critical point. We also discuss the convergence of NOWPAC in situations where evaluations of the objective function or the constraints are inexact, e.g., corrupted by numerical errors. For this, we determine a rate of decay that the magnitude of these numerical errors must satisfy, while approaching the critical point, to guarantee convergence. In settings where adjusting the accuracy of the objective or constraint evaluations is not possible, as is often the case in practical applications, we introduce an error indicator to detect these regimes and prevent deterioration of the optimization results." }

**No articles found matching filter.**

### Published articles

*Greedy inference with structure-exploiting lazy maps,*Advances in Neural Information Processing Systems (NeurIPS), (2020).

##### Greedy inference with structure-exploiting lazy maps

We propose a framework for solving high-dimensional Bayesian inference problems using structure-exploiting low-dimensional transport maps or flows. These maps are confined to a low-dimensional subspace (hence, lazy), and the subspace is identified by minimizing an upper bound on the Kullback–Leibler divergence (hence, structured). Our framework provides a principled way of identifying and exploiting low-dimensional structure in an inference problem. It focuses the expressiveness of a transport map along the directions of most significant discrepancy from the posterior, and can be used to build deep compositions of lazy maps, where low-dimensional projections of the parameters are iteratively transformed to match the posterior. We prove weak convergence of the generated sequence of distributions to the posterior, and we demonstrate the benefits of the framework on challenging inference problems in machine learning and differential equations, using inverse autoregressive flows and polynomial maps as examples of the underlying density estimators.

##### BibTeX entry

@article{brennanbigoni-lazy-2020, author = "D. Bigoni and O. Zahm and A. Spantini and Y. M. Marzouk", journal = "Advances in Neural Information Processing Systems (NeurIPS)", title = "Greedy inference with structure-exploiting lazy maps", year = "2020", abstract = "We propose a framework for solving high-dimensional Bayesian inference problems using structure-exploiting low-dimensional transport maps or flows. These maps are confined to a low-dimensional subspace (hence, lazy), and the subspace is identified by minimizing an upper bound on the Kullback--Leibler divergence (hence, structured). Our framework provides a principled way of identifying and exploiting low-dimensional structure in an inference problem. It focuses the expressiveness of a transport map along the directions of most significant discrepancy from the posterior, and can be used to build deep compositions of lazy maps, where low-dimensional projections of the parameters are iteratively transformed to match the posterior. We prove weak convergence of the generated sequence of distributions to the posterior, and we demonstrate the benefits of the framework on challenging inference problems in machine learning and differential equations, using inverse autoregressive flows and polynomial maps as examples of the underlying density estimators.", keywords = "Bayesian inference, greedy approximation, transport" }

*Data-driven forward discretizations for Bayesian inversion,*Inverse Problems, 36 (2020), pp. 105008.

##### Data-driven forward discretizations for Bayesian inversion

This paper suggests a framework for the learning of discretizations of expensive forward models in Bayesian inverse problems. The main idea is to incorporate the parameters governing the discretization as part of the unknown to be estimated within the Bayesian machinery. We numerically show that in a variety of inverse problems arising in mechanical engineering, signal processing and the geosciences, the observations contain useful information to guide the choice of discretization.

##### BibTeX entry

@article{bigoni-discretization-2020, author = "D. Bigoni and Y. Chen and N. Garcia-Trillos and Y. M. Marzouk and D. Sanz-Alonso", journal = "Inverse Problems", number = "10", pages = "105008", title = "Data-driven forward discretizations for Bayesian inversion", volume = "36", year = "2020", doi = "10.1088/1361-6420/abb2fa", abstract = "This paper suggests a framework for the learning of discretizations of expensive forward models in Bayesian inverse problems. The main idea is to incorporate the parameters governing the discretization as part of the unknown to be estimated within the Bayesian machinery. We numerically show that in a variety of inverse problems arising in mechanical engineering, signal processing and the geosciences, the observations contain useful information to guide the choice of discretization." }

*Efficient multi-scale Gaussian process regression for massive remote sensing data with satGP v0.1.2.,*Geoscientific Model Development, 13 (2020), pp. 3439–3463.

##### Efficient multi-scale Gaussian process regression for massive remote sensing data with satGP v0.1.2.

Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to making measurements on the ground, such as global coverage and enormous data volume. The typical downsides are spatial and temporal gaps and potentially low data quality. Meaningful statistical inference from such data requires overcoming these problems and developing efficient and robust computational tools. We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is able to handle problems of enormous sizes and to compute marginals and sample from the random field conditioning on at least hundreds of millions of observations. This is achieved by optimizing the computation by, e.g., randomization and splitting the problem into parallel local subproblems which aggressively discard uninformative data.

We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally. The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments.

We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license.

We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally. The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments.

We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license.

##### BibTeX entry

@article{susiluoto-satGP-2020, author = "J. Susiluoto and A. Spantini and H. Haario and T. Härkönen and Y. M. Marzouk", journal = "Geoscientific Model Development", pages = "3439–3463", title = "Efficient multi-scale Gaussian process regression for massive remote sensing data with satGP v0.1.2.", volume = "13", year = "2020", doi = "10.5194/gmd-13-3439-2020", abstract = "Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to making measurements on the ground, such as global coverage and enormous data volume. The typical downsides are spatial and temporal gaps and potentially low data quality. Meaningful statistical inference from such data requires overcoming these problems and developing efficient and robust computational tools. We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is able to handle problems of enormous sizes and to compute marginals and sample from the random field conditioning on at least hundreds of millions of observations. This is achieved by optimizing the computation by, e.g., randomization and splitting the problem into parallel local subproblems which aggressively discard uninformative data. We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally. The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments. We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license." }

*Inference of experimental radial impurity transport on Alcator C-Mod: Bayesian parameter estimation and model selection,*Nuclear Fusion, 60 (2020), pp. 126014.

##### Inference of experimental radial impurity transport on Alcator C-Mod: Bayesian parameter estimation and model selection

We present a fully Bayesian approach for the inference of radial profiles of impurity transport coefficients and compare its results to neoclassical, gyrofluid and gyrokinetic modeling. Using nested sampling, the Bayesian Impurity Transport InferencE (BITE) framework can handle complex parameter spaces with multiple possible solutions, offering great advantages in interpretative power and reliability with respect to previously demonstrated methods. BITE employs a forward model based on the pySTRAHL package, built on the success of the well-known STRAHL code [Dux, IPP Report, 2004], to simulate impurity transport in magnetically-confined plasmas. In this paper, we focus on calcium (Ca, Z=20) Laser Blow-Off injections into Alcator C-Mod plasmas. Multiple Ca atomic lines are diagnosed via high-resolution X-ray Imaging Crystal Spectroscopy and Vacuum Ultra-Violet measurements. We analyze a sawtoothing I-mode discharge for which neoclassical and turbulent (quasilinear and nonlinear) predictions are also obtained. We find good agreement in diffusion across the entire radial extent, while turbulent convection and density profile peaking are estimated to be larger in experiment than suggested by theory. Efforts and challenges associated with the inference of experimental pedestal impurity transport are discussed.

##### BibTeX entry

@article{sciortino-impurity-2020, author = "F. Sciortino and N. T. Howard and E. S. Marmar and T. Odstrcil and N. M. Cao and R. Dux and A. E. Hubbard and J. W. Hughes and J. H. Irby and Y. M. Marzouk and L. M. Milanese", journal = "Nuclear Fusion", pages = "126014", title = "Inference of experimental radial impurity transport on Alcator C-Mod: Bayesian parameter estimation and model selection", volume = "60", year = "2020", doi = "10.1088/1741-4326/abae85", abstract = "We present a fully Bayesian approach for the inference of radial profiles of impurity transport coefficients and compare its results to neoclassical, gyrofluid and gyrokinetic modeling. Using nested sampling, the Bayesian Impurity Transport InferencE (BITE) framework can handle complex parameter spaces with multiple possible solutions, offering great advantages in interpretative power and reliability with respect to previously demonstrated methods. BITE employs a forward model based on the pySTRAHL package, built on the success of the well-known STRAHL code [Dux, IPP Report, 2004], to simulate impurity transport in magnetically-confined plasmas. In this paper, we focus on calcium (Ca, Z=20) Laser Blow-Off injections into Alcator C-Mod plasmas. Multiple Ca atomic lines are diagnosed via high-resolution X-ray Imaging Crystal Spectroscopy and Vacuum Ultra-Violet measurements. We analyze a sawtoothing I-mode discharge for which neoclassical and turbulent (quasilinear and nonlinear) predictions are also obtained. We find good agreement in diffusion across the entire radial extent, while turbulent convection and density profile peaking are estimated to be larger in experiment than suggested by theory. Efforts and challenges associated with the inference of experimental pedestal impurity transport are discussed." }

*Gradient-based dimension reduction of multivariate vector-valued functions,*SIAM Journal on Scientific Computing, 42 (2020), pp. A534-A558.

##### Gradient-based dimension reduction of multivariate vector-valued functions

Multivariate functions encountered in high-dimensional uncertainty quantification problems often vary along a few dominant directions in the input parameter space. We propose a gradient-based method for detecting these directions and using them to construct {ridge approximations} of such functions, in a setting where the functions are vector-valued (e.g., taking values in $\mathbb{R}^n$). The methodology consists of minimizing an upper bound on the approximation error, obtained by {subspace Poincaré inequalities}. We provide a thorough mathematical analysis in the case where the parameter space is equipped with a Gaussian probability measure. The resulting method generalizes the notion of active subspaces associated with scalar-valued functions. A numerical illustration shows that using gradients of the function yields effective dimension reduction. We also show how the choice of norm on the codomain of the function has an impact on the function’s low-dimensional approximation.

##### BibTeX entry

@article{zahm-dimred-2020, author = "O. Zahm and P. Constantine and C. Prieur and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "1", pages = "A534-A558", title = "Gradient-based dimension reduction of multivariate vector-valued functions", volume = "42", year = "2020", abstract = "Multivariate functions encountered in high-dimensional uncertainty quantification problems often vary along a few dominant directions in the input parameter space. We propose a gradient-based method for detecting these directions and using them to construct {ridge approximations} of such functions, in a setting where the functions are vector-valued (e.g., taking values in $\mathbb{R}^n$). The methodology consists of minimizing an upper bound on the approximation error, obtained by {subspace Poincaré inequalities}. We provide a thorough mathematical analysis in the case where the parameter space is equipped with a Gaussian probability measure. The resulting method generalizes the notion of active subspaces associated with scalar-valued functions. A numerical illustration shows that using gradients of the function yields effective dimension reduction. We also show how the choice of norm on the codomain of the function has an impact on the function's low-dimensional approximation.", keywords = "high-dimensional function approximation, dimension reduction, active subspace, ridge approximation, Karhunen-Loève decomposition, Poincaré inequality" }

*Bayesian waveform-based calibration of high-pressure acoustic emission systems with ball drop measurements,*Geophysical Journal International, 221 (2020), pp. 20-36.

##### Bayesian waveform-based calibration of high-pressure acoustic emission systems with ball drop measurements

Acoustic emission (AE) is a widely used technology to study source mechanisms and material properties during high-pressure rock failure experiments. It is important to understand the physical quantities that acoustic emission sensors measure, as well as the response of these sensors as a function of frequency. This study calibrates the newly built AE system in the MIT Rock Physics Laboratory using a ball-bouncing system. Full waveforms of multi-bounce events due to ball drops are used to infer the transfer function of lead zirconate titanate (PZT) sensors in high pressure environments. Uncertainty in the sensor transfer functions is quantified using a waveform-based Bayesian approach. The quantification of \textit{in situ} sensor transfer functions makes it possible to apply full waveform analysis for acoustic emissions at high pressures.

##### BibTeX entry

@article{gu-waveform-2020, author = "C. Gu and U. Mok and Y. M. Marzouk and G. Prieto Gomez and F. Sheibani and J. B. Evans and B. Hager", journal = "Geophysical Journal International", pages = "20-36", title = "Bayesian waveform-based calibration of high-pressure acoustic emission systems with ball drop measurements", volume = "221", year = "2020", abstract = "Acoustic emission (AE) is a widely used technology to study source mechanisms and material properties during high-pressure rock failure experiments. It is important to understand the physical quantities that acoustic emission sensors measure, as well as the response of these sensors as a function of frequency. This study calibrates the newly built AE system in the MIT Rock Physics Laboratory using a ball-bouncing system. Full waveforms of multi-bounce events due to ball drops are used to infer the transfer function of lead zirconate titanate (PZT) sensors in high pressure environments. Uncertainty in the sensor transfer functions is quantified using a waveform-based Bayesian approach. The quantification of \textit{in situ} sensor transfer functions makes it possible to apply full waveform analysis for acoustic emissions at high pressures." }

*MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure,*SIAM Journal on Scientific Computing, 42 (2020), pp. A1765-A1788.

##### MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure

Markov chain Monte Carlo (MCMC) samplers are numerical methods for drawing samples from a given target probability distribution. We discuss one particular MCMC sampler, the MALA-within-Gibbs sampler, from the theoretical and practical perspectives. We first show that the acceptance ratio and step size of this sampler are independent of the overall problem dimension when (i) the target distribution has sparse conditional structure, and (ii) this structure is reflected in the partial updating strategy of MALA-within-Gibbs. If, in addition, the target density is block-wise log-concave, then the sampler’s convergence rate is independent of dimension. From a practical perspective, we expect that MALA-within-Gibbs is useful for solving high-dimensional Bayesian inference problems where the posterior exhibits sparse conditional structure at least approximately. In this context, a partitioning of the state that correctly reflects the sparse conditional structure must be found, and we illustrate this process in two numerical examples. We also discuss trade-offs between the block size used for partial updating and computational requirements that may increase with the number of blocks.

##### BibTeX entry

@article{tong-mala-2020, author = "X. T. Tong and M. Morzfeld and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "3", pages = "A1765-A1788", title = "MALA-within-Gibbs samplers for high-dimensional distributions with sparse conditional structure", volume = "42", year = "2020", abstract = "Markov chain Monte Carlo (MCMC) samplers are numerical methods for drawing samples from a given target probability distribution. We discuss one particular MCMC sampler, the MALA-within-Gibbs sampler, from the theoretical and practical perspectives. We first show that the acceptance ratio and step size of this sampler are independent of the overall problem dimension when (i) the target distribution has sparse conditional structure, and (ii) this structure is reflected in the partial updating strategy of MALA-within-Gibbs. If, in addition, the target density is block-wise log-concave, then the sampler's convergence rate is independent of dimension. From a practical perspective, we expect that MALA-within-Gibbs is useful for solving high-dimensional Bayesian inference problems where the posterior exhibits sparse conditional structure at least approximately. In this context, a partitioning of the state that correctly reflects the sparse conditional structure must be found, and we illustrate this process in two numerical examples. We also discuss trade-offs between the block size used for partial updating and computational requirements that may increase with the number of blocks.", keywords = "Bayesian computation, high-dimensional distribution, Markov chain Monte Carlo, sparsity, conditional independence, Metropolis-adjusted Langevin" }

*Multifidelity dimension reduction via active subspaces,*SIAM Journal on Scientific Computing, 42 (2020), pp. A929-A956.

##### Multifidelity dimension reduction via active subspaces

We propose a multifidelity dimension reduction method to identify a low-dimensional structure present in many engineering models. The structure of interest arises when functions vary primarily on a low-dimensional subspace of the high-dimensional input space, while varying little along the complementary directions. Our approach builds on the gradient-based methodology of active subspaces, and exploits models of different fidelities to reduce the cost of performing dimension reduction through the computation of the active subspace matrix. We provide a non-asymptotic analysis of the number of gradient evaluations sufficient to achieve a prescribed error in the active subspace matrix, both in expectation and with high probability. We show that the sample complexity depends on a notion of intrinsic dimension of the problem, which can be much smaller than the dimension of the input space. We illustrate the benefits of such a multifidelity dimension reduction approach using numerical experiments with input spaces of up to two thousand dimensions.

##### BibTeX entry

@article{lam-dimred-2020, author = "R. Lam and O. Zahm and Y. M. Marzouk and K. Willcox", journal = "SIAM Journal on Scientific Computing", number = "2", pages = "A929-A956", title = "Multifidelity dimension reduction via active subspaces", volume = "42", year = "2020", abstract = "We propose a multifidelity dimension reduction method to identify a low-dimensional structure present in many engineering models. The structure of interest arises when functions vary primarily on a low-dimensional subspace of the high-dimensional input space, while varying little along the complementary directions. Our approach builds on the gradient-based methodology of active subspaces, and exploits models of different fidelities to reduce the cost of performing dimension reduction through the computation of the active subspace matrix. We provide a non-asymptotic analysis of the number of gradient evaluations sufficient to achieve a prescribed error in the active subspace matrix, both in expectation and with high probability. We show that the sample complexity depends on a notion of intrinsic dimension of the problem, which can be much smaller than the dimension of the input space. We illustrate the benefits of such a multifidelity dimension reduction approach using numerical experiments with input spaces of up to two thousand dimensions.", keywords = "Dimension reduction, multifidelity, gradient-based, active subspace, intrinsic dimension, effective rank, matrix Bernstein inequality, control variate" }

*Scalable optimization-based sampling on function space,*SIAM Journal on Scientific Computing, 42 (2020), pp. A1317-1347.

##### Scalable optimization-based sampling on function space

Optimization-based samplers such as randomize-then-optimize (RTO) provide an efficient and parallellizable approach to solving large-scale Bayesian inverse problems. These methods solve randomly perturbed optimization problems to draw samples from an approximate posterior distribution. ‘‘Correcting’’ these samples, either by Metropolization or importance sampling, enables characterization of the original posterior distribution. This paper focuses on the scalability of RTO to problems with high- or infinite-dimensional parameters. In particular, we introduce a new subspace strategy to reformulate RTO. For problems with intrinsic low-rank structure, this subspace acceleration makes the computational complexity of RTO scale linearly with the parameter dimension. Furthermore, this subspace perspective suggests a natural extension of RTO to a function space setting. We thus formalize a function space version of RTO and establish sufficient conditions for it to produce a valid Metropolis–Hastings proposal, yielding dimension-independent sampling performance. Numerical examples corroborate the dimension-independence of RTO and demonstrate sampling performance that is also robust to small observational noise.

##### BibTeX entry

@article{art_76, author = "J. Bardsley and T. Cui and Y. M. Marzouk and Z. Wang", journal = "SIAM Journal on Scientific Computing", number = "2", pages = "A1317-1347", title = "Scalable optimization-based sampling on function space", volume = "42", year = "2020", abstract = "Optimization-based samplers such as randomize-then-optimize (RTO) provide an efficient and parallellizable approach to solving large-scale Bayesian inverse problems. These methods solve randomly perturbed optimization problems to draw samples from an approximate posterior distribution. ``Correcting'' these samples, either by Metropolization or importance sampling, enables characterization of the original posterior distribution. This paper focuses on the scalability of RTO to problems with high- or infinite-dimensional parameters. In particular, we introduce a new subspace strategy to reformulate RTO. For problems with intrinsic low-rank structure, this subspace acceleration makes the computational complexity of RTO scale linearly with the parameter dimension. Furthermore, this subspace perspective suggests a natural extension of RTO to a function space setting. We thus formalize a function space version of RTO and establish sufficient conditions for it to produce a valid Metropolis--Hastings proposal, yielding dimension-independent sampling performance. Numerical examples corroborate the dimension-independence of RTO and demonstrate sampling performance that is also robust to small observational noise.", keywords = "Markov chain Monte Carlo, Metropolis independence sampling, Bayesian inference, infinite-dimensional inverse problems" }

*On the importance of model selection when inferring impurity transport coefficient profiles,*Plasma Physics and Controlled Fusion, 61 (2019), pp. 125012.

##### On the importance of model selection when inferring impurity transport coefficient profiles

We present an analysis which suggests that model selection is a critical ingredient for successful reconstruction of impurity transport coefficient profiles, $D$ and $V$, from experimental data. Determining these quantities is a challenging nonlinear inverse problem. We use synthetic data to show that this problem is ill-posed, and hence $D$ and $V$ are not recommended for use in validation metrics unless the data analysis procedure goes to great lengths to account for the possibility that there are multiple possible solutions. In particular, inferred profiles which are very different from the true ones yield seemingly reasonable goodness-of-fit for synthetic X-ray spectrometer data. We present a Bayesian approach for inferring $D$ and $V$ which provides a rigorous means of selecting the level of complexity of the inferred profiles, thereby enabling successful reconstruction of the profiles.

##### BibTeX entry

@article{art_75, author = "M. Chilenski and M. Greenwald and Y. M. Marzouk and J. Rice and A. E. White", journal = "Plasma Physics and Controlled Fusion", month = "11", pages = "125012", title = "On the importance of model selection when inferring impurity transport coefficient profiles", volume = "61", year = "2019", doi = "10.1088/1361-6587/ab4e69", abstract = "We present an analysis which suggests that model selection is a critical ingredient for successful reconstruction of impurity transport coefficient profiles, $D$ and $V$, from experimental data. Determining these quantities is a challenging nonlinear inverse problem. We use synthetic data to show that this problem is ill-posed, and hence $D$ and $V$ are not recommended for use in validation metrics unless the data analysis procedure goes to great lengths to account for the possibility that there are multiple possible solutions. In particular, inferred profiles which are very different from the true ones yield seemingly reasonable goodness-of-fit for synthetic X-ray spectrometer data. We present a Bayesian approach for inferring $D$ and $V$ which provides a rigorous means of selecting the level of complexity of the inferred profiles, thereby enabling successful reconstruction of the profiles." }

*A transport-based multifidelity preconditioner for Markov chain Monte Carlo,*Advances in Computational Mathematics, 45 (2019), pp. 2321–2348.

##### A transport-based multifidelity preconditioner for Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) sampling of posterior distributions arising in Bayesian inverse problems is challenging when evaluations of the forward model are computationally expensive. Replacing the forward model with a low-cost, low-fidelity model often significantly reduces computational cost; however, employing a low-fidelity model alone means that the stationary distribution of the MCMC chain is the posterior distribution corresponding to the low-fidelity model, rather than the original posterior distribution corresponding to the high-fidelity model. We propose a multifidelity approach that combines, rather than replaces, the high-fidelity model with a low-fidelity model. First, the low-fidelity model is used to construct a transport map that deterministically couples a reference Gaussian distribution with an approximation of the low-fidelity posterior. Then, the high-fidelity posterior distribution is explored using a non-Gaussian proposal distribution derived from the transport map. This multifidelity "preconditioned" MCMC approach seeks efficient sampling via a proposal that is explicitly tailored to the posterior at hand and that is constructed efficiently with the low-fidelity model. By relying on the low-fidelity model only to construct the proposal distribution, our approach guarantees that the stationary distribution of the MCMC chain is the high-fidelity posterior. In our numerical examples, our multifidelity approach achieves significant speedups compared to single-fidelity MCMC sampling methods.

##### BibTeX entry

@article{art_74, author = "B. Peherstorfer and Y. M. Marzouk", journal = "Advances in Computational Mathematics", month = "12", number = "5-6", pages = "2321--2348", title = "A transport-based multifidelity preconditioner for Markov chain Monte Carlo", volume = "45", year = "2019", doi = "10.1007/s10444-019-09711-y", abstract = "Markov chain Monte Carlo (MCMC) sampling of posterior distributions arising in Bayesian inverse problems is challenging when evaluations of the forward model are computationally expensive. Replacing the forward model with a low-cost, low-fidelity model often significantly reduces computational cost; however, employing a low-fidelity model alone means that the stationary distribution of the MCMC chain is the posterior distribution corresponding to the low-fidelity model, rather than the original posterior distribution corresponding to the high-fidelity model. We propose a multifidelity approach that combines, rather than replaces, the high-fidelity model with a low-fidelity model. First, the low-fidelity model is used to construct a transport map that deterministically couples a reference Gaussian distribution with an approximation of the low-fidelity posterior. Then, the high-fidelity posterior distribution is explored using a non-Gaussian proposal distribution derived from the transport map. This multifidelity "preconditioned" MCMC approach seeks efficient sampling via a proposal that is explicitly tailored to the posterior at hand and that is constructed efficiently with the low-fidelity model. By relying on the low-fidelity model only to construct the proposal distribution, our approach guarantees that the stationary distribution of the MCMC chain is the high-fidelity posterior. In our numerical examples, our multifidelity approach achieves significant speedups compared to single-fidelity MCMC sampling methods.", keywords = "Bayesian inverse problems, transport maps, multifidelity, model reduction, Markov chain Monte Carlo" }

*Exploiting network topology for large-scale inference of nonlinear reaction models,*Journal of the Royal Society: Interface, 16 (2019), pp. 20180766.

##### Exploiting network topology for large-scale inference of nonlinear reaction models

The development of chemical reaction models aids understanding and prediction in areas ranging from biology to electrochemistry and combustion. A systematic approach to building reaction network models uses observational data not only to estimate unknown parameters, but also to learn model structure. Bayesian inference provides a natural approach to this data-driven construction of models. Yet traditional Bayesian model inference methodologies that numerically evaluate the evidence for each model are often infeasible for nonlinear reaction network inference, as the number of plausible models can be combinatorially large. Alternative approaches based on model-space sampling can enable large-scale network inference, but their realization presents many challenges. In this paper, we present new computational methods that make large-scale nonlinear network inference tractable. First, we exploit the topology of networks describing potential interactions among chemical species to design improved "between-model" proposals for reversible-jump Markov chain Monte Carlo. Second, we introduce a sensitivity-based determination of move types which, when combined with network-aware proposals, yields significant additional gains in sampling performance. These algorithms are demonstrated on inference problems drawn from systems biology, with nonlinear differential equation models of species interactions.

##### BibTeX entry

@article{art_72, author = "N. Galagali and Y. M. Marzouk", journal = "Journal of the Royal Society: Interface", month = "3", number = "152", pages = "20180766", title = "Exploiting network topology for large-scale inference of nonlinear reaction models", volume = "16", year = "2019", doi = "https://doi.org/10.1098/rsif.2018.0766", abstract = "The development of chemical reaction models aids understanding and prediction in areas ranging from biology to electrochemistry and combustion. A systematic approach to building reaction network models uses observational data not only to estimate unknown parameters, but also to learn model structure. Bayesian inference provides a natural approach to this data-driven construction of models. Yet traditional Bayesian model inference methodologies that numerically evaluate the evidence for each model are often infeasible for nonlinear reaction network inference, as the number of plausible models can be combinatorially large. Alternative approaches based on model-space sampling can enable large-scale network inference, but their realization presents many challenges. In this paper, we present new computational methods that make large-scale nonlinear network inference tractable. First, we exploit the topology of networks describing potential interactions among chemical species to design improved "between-model" proposals for reversible-jump Markov chain Monte Carlo. Second, we introduce a sensitivity-based determination of move types which, when combined with network-aware proposals, yields significant additional gains in sampling performance. These algorithms are demonstrated on inference problems drawn from systems biology, with nonlinear differential equation models of species interactions.", keywords = "reaction network, network inference, model selection, Bayesian inference, reversible-jump MCMC." }

*Localization for MCMC: sampling high-dimensional posterior distributions with local structure,*Journal of Computational Physics, 380 (2019), pp. 1–28.

##### Localization for MCMC: sampling high-dimensional posterior distributions with local structure

We investigate how ideas from covariance localization in numerical weather prediction can be used in Markov chain Monte Carlo (MCMC) sampling of high-dimensional posterior distributions arising in Bayesian inverse problems. To localize an inverse problem is to enforce an anticipated "local" structure by (i) neglecting small off-diagonal elements of the prior precision and covariance matrices; and (ii) restricting the influence of observations to their neighborhood. For linear problems we can specify the conditions under which posterior moments of the localized problem are close to those of the original problem. We explain physical interpretations of our assumptions about local structure and discuss the notion of high dimensionality in local problems, which is different from the usual notion of high dimensionality in function space MCMC. The Gibbs sampler is a natural choice of MCMC algorithm for localized inverse problems and we demonstrate that its convergence rate is independent of dimension for localized linear problems. Nonlinear problems can also be tackled efficiently by localization and, as a simple illustration of these ideas, we present a localized Metropolis-within-Gibbs sampler. Several linear and nonlinear numerical examples illustrate localization in the context of MCMC samplers for inverse problems.

##### BibTeX entry

@article{art_67, author = "M. Morzfeld and X. T. Tong and Y. M. Marzouk", journal = "Journal of Computational Physics", pages = "1--28", title = "Localization for MCMC: sampling high-dimensional posterior distributions with local structure", volume = "380", year = "2019", doi = "10.1016/j.jcp.2018.12.008", abstract = "We investigate how ideas from covariance localization in numerical weather prediction can be used in Markov chain Monte Carlo (MCMC) sampling of high-dimensional posterior distributions arising in Bayesian inverse problems. To localize an inverse problem is to enforce an anticipated "local" structure by (i) neglecting small off-diagonal elements of the prior precision and covariance matrices; and (ii) restricting the influence of observations to their neighborhood. For linear problems we can specify the conditions under which posterior moments of the localized problem are close to those of the original problem. We explain physical interpretations of our assumptions about local structure and discuss the notion of high dimensionality in local problems, which is different from the usual notion of high dimensionality in function space MCMC. The Gibbs sampler is a natural choice of MCMC algorithm for localized inverse problems and we demonstrate that its convergence rate is independent of dimension for localized linear problems. Nonlinear problems can also be tackled efficiently by localization and, as a simple illustration of these ideas, we present a localized Metropolis-within-Gibbs sampler. Several linear and nonlinear numerical examples illustrate localization in the context of MCMC samplers for inverse problems.", keywords = "Markov chain Monte Carlo, Bayesian inverse problems, high dimensions, localization, dimension-independent convergence" }

*A continuous analogue of the tensor-train decomposition,*Computer Methods in Applied Mechanics and Engineering, 347 (2019), pp. 59–84.

##### A continuous analogue of the tensor-train decomposition

We develop new approximation algorithms and data structures for representing and computing with multivariate functions using the functional tensor-train (FT), a continuous extension of the tensor-train (TT) decomposition. The FT represents functions using a tensor-train ansatz by replacing the three-dimensional TT cores with univariate matrix-valued functions. The main contribution of this paper is a framework to compute the FT that employs adaptive approximations of univariate fibers, and that is not tied to any tensorized discretization. The algorithm can be coupled with any univariate linear or nonlinear approximation procedure. We demonstrate that this approach can generate multivariate function approximations that are several orders of magnitude more accurate, for the same cost, than those based on the conventional approach of compressing the coefficient tensor of a tensor-product basis. Our approach is in the spirit of other continuous computation packages such as Chebfun, and yields an algorithm which requires the computation of “continuous” matrix factorizations such as the LU and QR decompositions of vector-valued functions. To support these developments, we describe continuous versions of an approximate maximum-volume cross approximation algorithm and of a rounding algorithm that re-approximates an FT by one of lower ranks. We demonstrate that our technique improves accuracy and robustness, compared to TT and quantics-TT approaches with fixed parameterizations, of high-dimensional integration, differentiation, and approximation of functions with local features such as discontinuities and other nonlinearities.

##### BibTeX entry

@article{art_70, author = "A. Gorodetsky and S. Karaman and Y. M. Marzouk", journal = "Computer Methods in Applied Mechanics and Engineering", pages = "59--84", title = "A continuous analogue of the tensor-train decomposition", volume = "347", year = "2019", doi = "10.1016/j.cma.2018.12.015", abstract = "We develop new approximation algorithms and data structures for representing and computing with multivariate functions using the functional tensor-train (FT), a continuous extension of the tensor-train (TT) decomposition. The FT represents functions using a tensor-train ansatz by replacing the three-dimensional TT cores with univariate matrix-valued functions. The main contribution of this paper is a framework to compute the FT that employs adaptive approximations of univariate fibers, and that is not tied to any tensorized discretization. The algorithm can be coupled with any univariate linear or nonlinear approximation procedure. We demonstrate that this approach can generate multivariate function approximations that are several orders of magnitude more accurate, for the same cost, than those based on the conventional approach of compressing the coefficient tensor of a tensor-product basis. Our approach is in the spirit of other continuous computation packages such as Chebfun, and yields an algorithm which requires the computation of “continuous” matrix factorizations such as the LU and QR decompositions of vector-valued functions. To support these developments, we describe continuous versions of an approximate maximum-volume cross approximation algorithm and of a rounding algorithm that re-approximates an FT by one of lower ranks. We demonstrate that our technique improves accuracy and robustness, compared to TT and quantics-TT approaches with fixed parameterizations, of high-dimensional integration, differentiation, and approximation of functions with local features such as discontinuities and other nonlinearities.", keywords = "tensor decompositions, tensor-train, Chebfun, high-dimensional approximation" }

*Quantifying kinetic uncertainty in turbulent combustion simulations using active subspaces,*Proceedings of the Combustion Institute, 37 (2019), pp. 2175–2182.

##### Quantifying kinetic uncertainty in turbulent combustion simulations using active subspaces

Uncertainty quantification in expensive turbulent combustion simulations usually adopts response surface techniques to accelerate Monte Carlo sampling. However, it is computationally intractable to build response surfaces for high-dimensional kinetic parameters. We employ the active subspaces approach to reduce the dimension of the parameter space, such that building a response surface on the resulting low-dimensional subspace requires many fewer runs of the expensive simulation, rendering the approach suitable for various turbulent combustion models. We demonstrate this approach in simulations of the Cabra H2/N2 jet flame, propagating the uncertainties of 21 kinetic parameters to the liftoff height. We identify a one-dimensional active subspace for the liftoff height using 84 runs of the simulations, from which a response surface with a one-dimensional input is built; the probability distribution of the liftoff height is then characterized by evaluating a large number of samples using the inexpensive response surface. In addition, the active subspace provides a global sensitivity metric for determining the most influential reactions. Comparison with autoignition tests reveals that the sensitivities to the HO2-related reactions in the Cabra flame are promoted by the diffusion processes. The present work demonstrates the capability of active subspaces in quantifying uncertainty in turbulent combustion simulations and provides physical insights into the flame via the active subspace-based sensitivity metric.

##### BibTeX entry

@article{art_69, author = "W. Ji and Z. Ren and Y. M. Marzouk and C. K. Law", journal = "Proceedings of the Combustion Institute", number = "2", pages = "2175--2182", title = "Quantifying kinetic uncertainty in turbulent combustion simulations using active subspaces", volume = "37", year = "2019", doi = "10.1016/j.proci.2018.06.206", abstract = "Uncertainty quantification in expensive turbulent combustion simulations usually adopts response surface techniques to accelerate Monte Carlo sampling. However, it is computationally intractable to build response surfaces for high-dimensional kinetic parameters. We employ the active subspaces approach to reduce the dimension of the parameter space, such that building a response surface on the resulting low-dimensional subspace requires many fewer runs of the expensive simulation, rendering the approach suitable for various turbulent combustion models. We demonstrate this approach in simulations of the Cabra H2/N2 jet flame, propagating the uncertainties of 21 kinetic parameters to the liftoff height. We identify a one-dimensional active subspace for the liftoff height using 84 runs of the simulations, from which a response surface with a one-dimensional input is built; the probability distribution of the liftoff height is then characterized by evaluating a large number of samples using the inexpensive response surface. In addition, the active subspace provides a global sensitivity metric for determining the most influential reactions. Comparison with autoignition tests reveals that the sensitivities to the HO2-related reactions in the Cabra flame are promoted by the diffusion processes. The present work demonstrates the capability of active subspaces in quantifying uncertainty in turbulent combustion simulations and provides physical insights into the flame via the active subspace-based sensitivity metric.", keywords = "uncertainty quantification, active subspaces, turbulent lifted flames, autoignition" }

*A Stein variational Newton method,*Advances in Neural Information Processing Systems (NeurIPS), 31 (2018).

##### A Stein variational Newton method

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.

##### BibTeX entry

@article{detomasso-svn-2020, author = "G. Detomasso and T. Cui and Y. M. Marzouk and R. Scheichl and A. Spantini", journal = "Advances in Neural Information Processing Systems (NeurIPS)", title = "A Stein variational Newton method", volume = "31", year = "2018", abstract = "Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.", keywords = "variational inference, Stein's method, transport, nonparametric approximation, gradient descent, Newton method" }

*Transport map accelerated Markov chain Monte Carlo,*SIAM/ASA Journal on Uncertainty Quantification, 6 (2018), pp. 645–682.

##### Transport map accelerated Markov chain Monte Carlo

We introduce a new framework for efficient sampling from complex probability distributions, using a combination of transport maps and the Metropolis-Hastings rule. The core idea is to use deterministic couplings to transform typical Metropolis proposal mechanisms (e.g., random walks, Langevin methods) into non-Gaussian proposal distributions that can more effectively explore the target density. Our approach adaptively constructs a lower triangular transport map–-an approximation of the Knothe-Rosenblatt rearrangement–-using information from previous Markov chain Monte Carlo (MCMC) states, via the solution of an optimization problem. This optimization problem is convex regardless of the form of the target distribution, and can be solved efficiently without gradient information from the target probability distribution; the target distribution is instead represented via samples. Sequential updates enable efficient and parallelizable adaptation of the map even for large numbers of samples. We show that this approach uses inexact or truncated maps to produce an adaptive MCMC algorithm that is ergodic for the exact target distribution. Numerical demonstrations on a range of parameter inference problems show order-of-magnitude speedups over standard MCMC techniques, measured by the number of effectively independent samples produced per target density evaluation and per unit of wallclock time.

##### BibTeX entry

@article{art_66, author = "M. Parno and Y. M. Marzouk", journal = "SIAM/ASA Journal on Uncertainty Quantification", number = "2", pages = "645--682", title = "Transport map accelerated Markov chain Monte Carlo", volume = "6", year = "2018", doi = "10.1137/17M1134640", abstract = "We introduce a new framework for efficient sampling from complex probability distributions, using a combination of transport maps and the Metropolis-Hastings rule. The core idea is to use deterministic couplings to transform typical Metropolis proposal mechanisms (e.g., random walks, Langevin methods) into non-Gaussian proposal distributions that can more effectively explore the target density. Our approach adaptively constructs a lower triangular transport map---an approximation of the Knothe-Rosenblatt rearrangement---using information from previous Markov chain Monte Carlo (MCMC) states, via the solution of an optimization problem. This optimization problem is convex regardless of the form of the target distribution, and can be solved efficiently without gradient information from the target probability distribution; the target distribution is instead represented via samples. Sequential updates enable efficient and parallelizable adaptation of the map even for large numbers of samples. We show that this approach uses inexact or truncated maps to produce an adaptive MCMC algorithm that is ergodic for the exact target distribution. Numerical demonstrations on a range of parameter inference problems show order-of-magnitude speedups over standard MCMC techniques, measured by the number of effectively independent samples produced per target density evaluation and per unit of wallclock time.", keywords = "adaptive MCMC, Bayesian inference, measure transformation, optimal transport" }

*An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems,*Inverse Problems, 34 (2018), pp. 095001.

##### An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems

We study Bayesian inference methods for solving linear inverse problems, focusing on hierarchical formulations where the prior or the likelihood function depend on unspecified hyperparameters. In practice, these hyperparameters are often determined via an empirical Bayesian method that maximizes the marginal likelihood function, i.e., the probability density of the data conditional on the hyperparameters. Evaluating the marginal likelihood, however, is computationally challenging for large-scale problems. In this work, we present a method to approximately evaluate marginal likelihood functions, based on a low-rank approximation of the update from the prior covariance to the posterior covariance. We show that this approximation is optimal in a minimax sense. Moreover, we provide an efficient algorithm to implement the proposed method, based on a combination of the randomized SVD and a spectral approximation method to compute square roots of the prior covariance matrix. Several numerical examples demonstrate good performance of the proposed method.

##### BibTeX entry

@article{art_65, author = "Q. Zhou and W. Liu and J. Li and Y. M. Marzouk", journal = "Inverse Problems", number = "9", pages = "095001", title = "An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems", volume = "34", year = "2018", doi = "10.1088/1361-6420/aac287", abstract = "We study Bayesian inference methods for solving linear inverse problems, focusing on hierarchical formulations where the prior or the likelihood function depend on unspecified hyperparameters. In practice, these hyperparameters are often determined via an empirical Bayesian method that maximizes the marginal likelihood function, i.e., the probability density of the data conditional on the hyperparameters. Evaluating the marginal likelihood, however, is computationally challenging for large-scale problems. In this work, we present a method to approximately evaluate marginal likelihood functions, based on a low-rank approximation of the update from the prior covariance to the posterior covariance. We show that this approximation is optimal in a minimax sense. Moreover, we provide an efficient algorithm to implement the proposed method, based on a combination of the randomized SVD and a spectral approximation method to compute square roots of the prior covariance matrix. Several numerical examples demonstrate good performance of the proposed method." }

*Optimal approximations of coupling in multidisciplinary models,*AIAA Journal, 56 (2018), pp. 2412–2428.

##### Optimal approximations of coupling in multidisciplinary models

This paper presents a methodology for identifying important discipline couplings in multicomponent engineering systems. Coupling among disciplines contributes significantly to the computational cost of analyzing a system, and can become particularly burdensome when coupled analyses are embedded within a design or optimization loop. In many cases, disciplines may be weakly coupled, so that some of the coupling or interaction terms can be neglected without significantly. impacting the accuracy of the system output. Typical practice derives such approximations in an ad hoc manner using expert opinion and domain experience. This work proposes a new approach that formulates an optimization problem to find a model that optimally balances accuracy of the model outputs with the sparsity of the discipline couplings. An adaptive sequential Monte Carlo. sampling-based technique is used to efficiently search the combinatorial model space of different discipline couplings. An algorithm for selecting an optimal model is presented and illustrated in a fire detection satellite model and a turbine engine cycle analysis model.

##### BibTeX entry

@article{art_64, author = "R. Baptista and Y. M. Marzouk and K. Willcox and B. Peherstorfer", journal = "AIAA Journal", number = "6", pages = "2412--2428", title = "Optimal approximations of coupling in multidisciplinary models", volume = "56", year = "2018", doi = "10.2514/1.J056888", abstract = "This paper presents a methodology for identifying important discipline couplings in multicomponent engineering systems. Coupling among disciplines contributes significantly to the computational cost of analyzing a system, and can become particularly burdensome when coupled analyses are embedded within a design or optimization loop. In many cases, disciplines may be weakly coupled, so that some of the coupling or interaction terms can be neglected without significantly. impacting the accuracy of the system output. Typical practice derives such approximations in an ad hoc manner using expert opinion and domain experience. This work proposes a new approach that formulates an optimization problem to find a model that optimally balances accuracy of the model outputs with the sparsity of the discipline couplings. An adaptive sequential Monte Carlo. sampling-based technique is used to efficiently search the combinatorial model space of different discipline couplings. An algorithm for selecting an optimal model is presented and illustrated in a fire detection satellite model and a turbine engine cycle analysis model." }

*Conditional classifiers and boosted conditional Gaussian mixture models for novelty detection,*Pattern Recognition, 81 (2018), pp. 601–614.

##### Conditional classifiers and boosted conditional Gaussian mixture models for novelty detection

Novelty detection is an important task in a variety of applications such as object recognition, defect localization, medical diagnostics, and event detection. The objective of novelty detection is to distinguish one class, for which data are available, from all other possible classes when there is insufficient information to build an explicit model for the latter. The data from the observed class are usually represented in terms of certain features which can be modeled as random variables (RV). An important challenge for novelty detection in multivariate problems is characterizing the statistical dependencies among these RVs. Failure to consider these dependencies may lead to inaccurate predictions, usually in the form of high false positive rates. In this study, we propose conditional classifiers as a new approach for novelty detection that is capable of accounting for statistical dependencies of the relevant RVs without simplifying assumptions. To implement the proposed idea, we use Gaussian mixture models (GMM) along with forward stage-wise additive modeling and boosting methods to learn the conditional densities of RVs that represent our observed data. The resulting model, which is called a boosted conditional GMM, is then used as a basis for classification. To test the performance of the proposed method, we apply it to a realistic application problem for analyzing sensor networks and compare the results to those obtained with a one-class support vector machine.

##### BibTeX entry

@article{art_63, author = "R. Mohammadi-Ghazi and Y. M. Marzouk and Oral Büyüköztürk", journal = "Pattern Recognition", pages = "601--614", title = "Conditional classifiers and boosted conditional Gaussian mixture models for novelty detection", volume = "81", year = "2018", doi = "10.1016/j.patcog.2018.03.022", abstract = "Novelty detection is an important task in a variety of applications such as object recognition, defect localization, medical diagnostics, and event detection. The objective of novelty detection is to distinguish one class, for which data are available, from all other possible classes when there is insufficient information to build an explicit model for the latter. The data from the observed class are usually represented in terms of certain features which can be modeled as random variables (RV). An important challenge for novelty detection in multivariate problems is characterizing the statistical dependencies among these RVs. Failure to consider these dependencies may lead to inaccurate predictions, usually in the form of high false positive rates. In this study, we propose conditional classifiers as a new approach for novelty detection that is capable of accounting for statistical dependencies of the relevant RVs without simplifying assumptions. To implement the proposed idea, we use Gaussian mixture models (GMM) along with forward stage-wise additive modeling and boosting methods to learn the conditional densities of RVs that represent our observed data. The resulting model, which is called a boosted conditional GMM, is then used as a basis for classification. To test the performance of the proposed method, we apply it to a realistic application problem for analyzing sensor networks and compare the results to those obtained with a one-class support vector machine.", keywords = "Novelty detection, mixture models, graphical models, conditional dependence, conditional density, additive modeling, boosting, false positive" }

*Multilevel sequential Monte Carlo with dimension-independent likelihood-informed proposals,*SIAM/ASA Journal on Uncertainty Quantification, 6 (2018), pp. 762–786.

##### Multilevel sequential Monte Carlo with dimension-independent likelihood-informed proposals

In this article we develop a new sequential Monte Carlo (SMC) method for multilevel (ML) Monte Carlo estimation. In particular, the method can be used to estimate expectations with respect to a target probability distribution over an infinite-dimensional and non-compact space as given, for example, by a Bayesian inverse problem with Gaussian random field prior. Under suitable assumptions the MLSMC method has the optimal $O(\epsilon^{-2})$ bound on the cost to obtain a mean-square error of $O(\epsilon^{2})$. The algorithm is accelerated by dimension-independent likelihood-informed (DILI) proposals designed for Gaussian priors, leveraging a novel variation which uses empirical sample covariance information in lieu of Hessian information, hence eliminating the requirement for gradient evaluations. The efficiency of the algorithm is illustrated on two examples: inversion of noisy pressure measurements in a PDE model of Darcy flow to recover the posterior distribution of the permeability field, and inversion of noisy measurements of the solution of an SDE to recover the posterior path measure.

##### BibTeX entry

@article{art_62, author = "A. Beskos and A. Jasra and K. Law and Y. M. Marzouk and Y. Zhou", journal = "SIAM/ASA Journal on Uncertainty Quantification", pages = "762--786", title = "Multilevel sequential Monte Carlo with dimension-independent likelihood-informed proposals", volume = "6", year = "2018", doi = "10.1137/17M1120993", abstract = "In this article we develop a new sequential Monte Carlo (SMC) method for multilevel (ML) Monte Carlo estimation. In particular, the method can be used to estimate expectations with respect to a target probability distribution over an infinite-dimensional and non-compact space as given, for example, by a Bayesian inverse problem with Gaussian random field prior. Under suitable assumptions the MLSMC method has the optimal $O(\epsilon^{-2})$ bound on the cost to obtain a mean-square error of $O(\epsilon^{2})$. The algorithm is accelerated by dimension-independent likelihood-informed (DILI) proposals designed for Gaussian priors, leveraging a novel variation which uses empirical sample covariance information in lieu of Hessian information, hence eliminating the requirement for gradient evaluations. The efficiency of the algorithm is illustrated on two examples: inversion of noisy pressure measurements in a PDE model of Darcy flow to recover the posterior distribution of the permeability field, and inversion of noisy measurements of the solution of an SDE to recover the posterior path measure.", keywords = "multilevel Monte Carlo, sequential Monte Carlo, Bayesian inverse problem" }

*Efficient design and verification of diagnostics for impurity transport experiments,*Review of Scientific Instruments, 89 (2018), pp. 013504.

##### Efficient design and verification of diagnostics for impurity transport experiments

Recent attempts to measure impurity transport in Alcator C-Mod using an x-ray imaging crystal spectrometer and laser blow-off impurity injector have failed to yield unique reconstructions of the transport coefficient profiles. This paper presents a fast, linearized model which was constructed to estimate diagnostic requirements for impurity transport experiments. The analysis shows that the spectroscopic diagnostics on Alcator C-Mod should be capable of inferring simple profiles of impurity diffusion $D_Z$ and convection $V_Z$ accurate to better than ±10% uncertainty, suggesting that the failure to infer unique $D_Z$, $V_Z$ from experimental data is attributable to an inadequate analysis procedure rather than the result of insufficient diagnostics. Furthermore, the analysis reveals that spatial resolution is more important than temporal resolution for typical diagnostic sampling rates and noise levels. This approach can be adapted to design and verify diagnostics for transport experiments on any magnetic confinement device.

##### BibTeX entry

@article{art_61, author = "M. Chilenski and M. Greenwald and Y. M. Marzouk and J. Rice and A. White", journal = "Review of Scientific Instruments", month = "1", pages = "013504", title = "Efficient design and verification of diagnostics for impurity transport experiments", volume = "89", year = "2018", doi = "10.1063/1.4997251", abstract = "Recent attempts to measure impurity transport in Alcator C-Mod using an x-ray imaging crystal spectrometer and laser blow-off impurity injector have failed to yield unique reconstructions of the transport coefficient profiles. This paper presents a fast, linearized model which was constructed to estimate diagnostic requirements for impurity transport experiments. The analysis shows that the spectroscopic diagnostics on Alcator C-Mod should be capable of inferring simple profiles of impurity diffusion $D_Z$ and convection $V_Z$ accurate to better than ±10% uncertainty, suggesting that the failure to infer unique $D_Z$, $V_Z$ from experimental data is attributable to an inadequate analysis procedure rather than the result of insufficient diagnostics. Furthermore, the analysis reveals that spatial resolution is more important than temporal resolution for typical diagnostic sampling rates and noise levels. This approach can be adapted to design and verify diagnostics for transport experiments on any magnetic confinement device." }

*Parallel local approximation MCMC for expensive models,*SIAM/ASA Journal on Uncertainty Quantification, 6 (2018), pp. 339–373.

##### Parallel local approximation MCMC for expensive models

Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be exceedingly expensive when posterior evaluations invoke the evaluation of a. computationally expensive model, such as a system of partial differential equations. In recent work [Conrad et al. JASA 2016, arXiv:1402.1694], we described a framework for constructing and refining local approximations of such models during an MCMC simulation. These posterior–adapted approximations harness regularity of the model to reduce the computational cost of inference while preserving asymptotic exactness of the Markov chain. Here we describe two extensions of that work. First, we prove that samplers running in parallel can collaboratively construct a shared posterior approximation while ensuring ergodicity of each associated chain, providing a novel opportunity for exploiting parallel computation in MCMC. Second, focusing on the Metropolis–adjusted Langevin algorithm, we describe how a proposal distribution can successfully employ gradients and other relevant information extracted from the approximation. We investigate the practical performance of our strategies using two challenging inference problems, the first in subsurface hydrology and the second in glaciology. Using local approximations constructed via parallel chains, we successfully reduce the run time needed to characterize the posterior distributions in these problems from days to hours and from months to days, respectively, dramatically improving the tractability of Bayesian inference.

##### BibTeX entry

@article{art_60, author = "P. Conrad and A. Davis and Y. M. Marzouk and N. Pillai and A. Smith", journal = "SIAM/ASA Journal on Uncertainty Quantification", number = "1", pages = "339--373", title = "Parallel local approximation MCMC for expensive models", volume = "6", year = "2018", doi = "10.1137/16M1084080", abstract = "Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be exceedingly expensive when posterior evaluations invoke the evaluation of a. computationally expensive model, such as a system of partial differential equations. In recent work [Conrad et al. JASA 2016, arXiv:1402.1694], we described a framework for constructing and refining local approximations of such models during an MCMC simulation. These posterior--adapted approximations harness regularity of the model to reduce the computational cost of inference while preserving asymptotic exactness of the Markov chain. Here we describe two extensions of that work. First, we prove that samplers running in parallel can collaboratively construct a shared posterior approximation while ensuring ergodicity of each associated chain, providing a novel opportunity for exploiting parallel computation in MCMC. Second, focusing on the Metropolis--adjusted Langevin algorithm, we describe how a proposal distribution can successfully employ gradients and other relevant information extracted from the approximation. We investigate the practical performance of our strategies using two challenging inference problems, the first in subsurface hydrology and the second in glaciology. Using local approximations constructed via parallel chains, we successfully reduce the run time needed to characterize the posterior distributions in these problems from days to hours and from months to days, respectively, dramatically improving the tractability of Bayesian inference.", keywords = "Markov chain Monte Carlo, parallel computing, Metropolis-adjusted Langevin algorithm, Bayesian inference, approximation theory, local regression, surrogate modeling" }

*High-dimensional stochastic optimal control using continuous tensor decompositions,*International Journal of Robotics Research, 37 (2018), pp. 340–377.

##### High-dimensional stochastic optimal control using continuous tensor decompositions

Motion planning and control problems are embedded and essential in almost all robotics applications. These problems are often formulated as stochastic optimal control problems and solved using dynamic programming algorithms. Unfortunately, most existing algorithms that guarantee convergence to optimal solutions suffer from the curse of dimensionality: the run time of the algorithm grows exponentially with the dimension of the state space of the system. We propose novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure. The proposed algorithms are based on continuous tensor decompositions recently developed by the authors. Essentially, the algorithms represent high-dimensional functions (e.g., the value function) in a compressed format, and directly perform dynamic programming computations (e.g., value iteration, policy iteration) in this format. Under certain technical assumptions, the new algorithms guarantee convergence towards optimal solutions with arbitrary precision. Furthermore, the run times of the new algorithms scale polynomially with the state dimension and polynomially with the ranks of the value function. This approach realizes substantial computational savings in “compressible” problem instances, where value functions admit low-rank approximations. We demonstrate the new algorithms in a wide range of problems, including a simulated six-dimensional agile quadcopter maneuvering example and a seven-dimensional aircraft perching example. In some of these examples, we estimate computational savings of up to 10 orders of magnitude over standard value iteration algorithms. We further demonstrate the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.

##### BibTeX entry

@article{art_59, author = "A. Gorodetsky and S. Karaman and Y. M. Marzouk", journal = "International Journal of Robotics Research", month = "3", number = "2-3", pages = "340--377", title = "High-dimensional stochastic optimal control using continuous tensor decompositions", volume = "37", year = "2018", doi = "10.1177/0278364917753994", abstract = "Motion planning and control problems are embedded and essential in almost all robotics applications. These problems are often formulated as stochastic optimal control problems and solved using dynamic programming algorithms. Unfortunately, most existing algorithms that guarantee convergence to optimal solutions suffer from the curse of dimensionality: the run time of the algorithm grows exponentially with the dimension of the state space of the system. We propose novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure. The proposed algorithms are based on continuous tensor decompositions recently developed by the authors. Essentially, the algorithms represent high-dimensional functions (e.g., the value function) in a compressed format, and directly perform dynamic programming computations (e.g., value iteration, policy iteration) in this format. Under certain technical assumptions, the new algorithms guarantee convergence towards optimal solutions with arbitrary precision. Furthermore, the run times of the new algorithms scale polynomially with the state dimension and polynomially with the ranks of the value function. This approach realizes substantial computational savings in “compressible” problem instances, where value functions admit low-rank approximations. We demonstrate the new algorithms in a wide range of problems, including a simulated six-dimensional agile quadcopter maneuvering example and a seven-dimensional aircraft perching example. In some of these examples, we estimate computational savings of up to 10 orders of magnitude over standard value iteration algorithms. We further demonstrate the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.", keywords = "stochastic optimal control, motion planning, dynamic programming, tensor decompositions" }

*Inferring fault frictional and reservoir hydraulic properties from injection-induced seismicity,*Geophysical Research Letters, 45 (2018), pp. 1313–1320.

##### Inferring fault frictional and reservoir hydraulic properties from injection-induced seismicity

Characterizing the rheological properties of faults and the evolution of fault friction during seismic slip are fundamental problems in geology and seismology. Recent increases in the frequency of induced earthquakes have intensified the need for robust methods to estimate fault properties. Here, we present a novel approach for aquifer- and fault-property estimation, which combines coupled multiphysics simulation of injection-induced seismicity with adaptive surrogate-based Bayesian inversion. In a synthetic 2D model, we use aquifer pressure, ground displacements, and fault slip measurements during fluid injection to estimate the dynamic fault friction, the critical slip distance, and the aquifer permeability. Our forward model allows us to observe non-monotonic evolutions of shear traction and slip on the fault resulting from the interplay of several physical mechanisms, including injection-induced aquifer expansion, stress transfer along the fault, and slip-induced stress relaxation. This interplay provides the basis for a successful joint inversion of induced seismicity, yielding well-informed Bayesian posterior distributions of dynamic friction and critical slip. We uncover an inverse relationship between dynamic friction and critical slip distance, which is in agreement with the small dynamic friction and large critical slip reported during seismicity on mature faults.

##### BibTeX entry

@article{art_58, author = "J. Jagalur-Mohan and B. Jha and Z. Wang and R. Juanes and Y. M. Marzouk", journal = "Geophysical Research Letters", number = "3", pages = "1313--1320", title = "Inferring fault frictional and reservoir hydraulic properties from injection-induced seismicity", volume = "45", year = "2018", doi = "10.1002/2017GL075925", abstract = "Characterizing the rheological properties of faults and the evolution of fault friction during seismic slip are fundamental problems in geology and seismology. Recent increases in the frequency of induced earthquakes have intensified the need for robust methods to estimate fault properties. Here, we present a novel approach for aquifer- and fault-property estimation, which combines coupled multiphysics simulation of injection-induced seismicity with adaptive surrogate-based Bayesian inversion. In a synthetic 2D model, we use aquifer pressure, ground displacements, and fault slip measurements during fluid injection to estimate the dynamic fault friction, the critical slip distance, and the aquifer permeability. Our forward model allows us to observe non-monotonic evolutions of shear traction and slip on the fault resulting from the interplay of several physical mechanisms, including injection-induced aquifer expansion, stress transfer along the fault, and slip-induced stress relaxation. This interplay provides the basis for a successful joint inversion of induced seismicity, yielding well-informed Bayesian posterior distributions of dynamic friction and critical slip. We uncover an inverse relationship between dynamic friction and critical slip distance, which is in agreement with the small dynamic friction and large critical slip reported during seismicity on mature faults." }

*Inference via low-dimensional couplings,*The Journal of Machine Learning Research, 19 (2018), pp. 1–71.

##### Inference via low-dimensional couplings

Integration against an intractable probability measure is among the fundamental challenges of statistical inference, particularly in the Bayesian setting. A principled approach to this problem seeks a deterministic coupling of the measure of interest with a tractable ‘‘reference’’ measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Yet characterizing such a map–-e.g., representing and evaluating it–-grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of certain low-dimensional couplings, induced by transport maps that are sparse or decomposable. Our analysis not only facilitates the construction of couplings in high-dimensional settings, but also suggests new inference methodologies. For instance, in the context of nonlinear and non-Gaussian state space models, we describe new variational algorithms for online filtering, smoothing, and parameter estimation. These algorithms implicitly characterize–-via a transport map–-the full posterior distribution of the sequential inference problem using local operations only incrementally more complex than regular filtering, while avoiding importance sampling or resampling.

##### BibTeX entry

@article{art_57, author = "A. Spantini and D. Bigoni and Y. M. Marzouk", journal = "The Journal of Machine Learning Research", number = "66", pages = "1--71", title = "Inference via low-dimensional couplings", volume = "19", year = "2018", abstract = "Integration against an intractable probability measure is among the fundamental challenges of statistical inference, particularly in the Bayesian setting. A principled approach to this problem seeks a deterministic coupling of the measure of interest with a tractable ``reference'' measure (e.g., a standard Gaussian). This coupling is induced by a transport map, and enables direct simulation from the desired measure simply by evaluating the transport map at samples from the reference. Yet characterizing such a map---e.g., representing and evaluating it---grows challenging in high dimensions. The central contribution of this paper is to establish a link between the Markov properties of the target measure and the existence of certain low-dimensional couplings, induced by transport maps that are sparse or decomposable. Our analysis not only facilitates the construction of couplings in high-dimensional settings, but also suggests new inference methodologies. For instance, in the context of nonlinear and non-Gaussian state space models, we describe new variational algorithms for online filtering, smoothing, and parameter estimation. These algorithms implicitly characterize---via a transport map---the full posterior distribution of the sequential inference problem using local operations only incrementally more complex than regular filtering, while avoiding importance sampling or resampling.", keywords = "transport map, rearrangement, Bayesian inference, variational inference, graphical model, Markov random field, sparsity, Kalman recursions, filtering, smoothing, joint parameter-state estimation, state-space model" }

*Waveform-based Bayesian full moment tensor inversion and uncertainty determination for induced seismicity in an oil/gas field,*Geophysical Journal International, 212 (2018), pp. 1963–1985.

##### Waveform-based Bayesian full moment tensor inversion and uncertainty determination for induced seismicity in an oil/gas field

Small earthquakes occur due to natural tectonic motions and are induced by oil and gas production processes. In many oil/gas fields and hydrofracking processeshydrofrackings, induced earthquakes result from fluid extraction or injection. The locations and source mechanisms of these earthquakes provide valuable information about the reservoirs. Analysis of induced seismic events has mostly assumed a double-couple source mechanism. However, recent studies have shown a non-negligible percentage of non-double-couple components of source moment tensors in hydraulic fracturing events, assuming a full moment tensor source mechanism. Without uncertainty quantification of the moment tensor solution, it is difficult to determine the reliability of these source models. This study develops a Bayesian method to perform waveform-based full moment tensor inversion and uncertainty quantification for induced seismic events, accounting for both location and velocity model uncertainties. We conduct tests with synthetic events to validate the method, and then apply our newly developed Bayesian inversion approach to real induced seismicity in an oil/gas field in the sultanate of Oman—determining the uncertainties in the source mechanism and in the location of that event.

##### BibTeX entry

@article{art_54, author = "C. Gu and Y. M. Marzouk and M. N. Toksöz", journal = "Geophysical Journal International", month = "3", number = "1", pages = "1963--1985", title = "Waveform-based Bayesian full moment tensor inversion and uncertainty determination for induced seismicity in an oil/gas field", volume = "212", year = "2018", doi = "10.1093/gji/ggx517", abstract = "Small earthquakes occur due to natural tectonic motions and are induced by oil and gas production processes. In many oil/gas fields and hydrofracking processeshydrofrackings, induced earthquakes result from fluid extraction or injection. The locations and source mechanisms of these earthquakes provide valuable information about the reservoirs. Analysis of induced seismic events has mostly assumed a double-couple source mechanism. However, recent studies have shown a non-negligible percentage of non-double-couple components of source moment tensors in hydraulic fracturing events, assuming a full moment tensor source mechanism. Without uncertainty quantification of the moment tensor solution, it is difficult to determine the reliability of these source models. This study develops a Bayesian method to perform waveform-based full moment tensor inversion and uncertainty quantification for induced seismic events, accounting for both location and velocity model uncertainties. We conduct tests with synthetic events to validate the method, and then apply our newly developed Bayesian inversion approach to real induced seismicity in an oil/gas field in the sultanate of Oman—determining the uncertainties in the source mechanism and in the location of that event.", keywords = "induced seismicity, Bayesian inversion, full moment tensor, uncertainty quantification" }

*Data-driven integral boundary layer modeling for airfoil performance prediction in the laminar regime,*AIAA Journal, 56 (2018), pp. 482–496.

##### Data-driven integral boundary layer modeling for airfoil performance prediction in the laminar regime

Many simulation tools for airfoil analysis and design are based on an integral approximation of the boundary layer. This approximate formulation cannot resolve the full dynamics of boundary-layer flows and hence requires additional models to account for unresolved effects. This paper introduces a new, data-driven, probabilistic model of these unresolved effects for the incompressible and laminar regime. To construct this model, methods from supervised learning have been applied to a dataset containing over 1550 airfoils. The result is a model that 1) is based on a large dataset of realistic airfoil configurations, and 2) quantifies the model inadequacy associated with the use of an approximate boundary-layer formulation. A stochastic version of the airfoil design tool XFOIL has also been created by replacing its original boundary-layer model with the probabilistic model developed here. This stochastic version of XFOIL has been applied to compute the drag polars of two airfoils at low Reynolds numbers and the results are compared with experimental data.

##### BibTeX entry

@article{art_53, author = "A. Marques and Q. Wang and Y. M. Marzouk", journal = "AIAA Journal", number = "2", pages = "482--496", title = "Data-driven integral boundary layer modeling for airfoil performance prediction in the laminar regime", volume = "56", year = "2018", doi = "10.2514/1.J055877", abstract = "Many simulation tools for airfoil analysis and design are based on an integral approximation of the boundary layer. This approximate formulation cannot resolve the full dynamics of boundary-layer flows and hence requires additional models to account for unresolved effects. This paper introduces a new, data-driven, probabilistic model of these unresolved effects for the incompressible and laminar regime. To construct this model, methods from supervised learning have been applied to a dataset containing over 1550 airfoils. The result is a model that 1) is based on a large dataset of realistic airfoil configurations, and 2) quantifies the model inadequacy associated with the use of an approximate boundary-layer formulation. A stochastic version of the airfoil design tool XFOIL has also been created by replacing its original boundary-layer model with the probabilistic model developed here. This stochastic version of XFOIL has been applied to compute the drag polars of two airfoils at low Reynolds numbers and the results are compared with experimental data." }

*Shared low-dimensional subspaces for propagating kinetic uncertainty to multiple outputs,*Combustion and Flame, 190 (2018), pp. 146–157.

##### Shared low-dimensional subspaces for propagating kinetic uncertainty to multiple outputs

Forward propagation of kinetic uncertainty in combustion simulations usually adopts response surface techniques to accelerate Monte Carlo sampling. Yet it is computationally challenging to build response surfaces for high-dimensional input parameters and expensive combustion models. This study uses the active subspace method to identify a low-dimensional subspace of the input space, within which response surfaces can be built. Active subspace methods have previously been developed only for single (scalar) model outputs, however. This paper introduces a new method that can simultaneously approximate the marginal probability density functions of multiple outputs using a single low-dimensional shared subspace. We identify the shared subspace by solving a least-squares system to compute an appropriate combination of single-output active subspaces. Because the identification of the active subspace for each individual output may require a significant number of samples, this process may be computationally intractable for expensive models such as turbulent combustion simulations. Instead, we propose a heuristic approach that learns the relevant subspaces from cheaper combustion models. The performance of the active subspace for a single output, and of the shared subspace for multiple outputs, is first demonstrated with the ignition delay times and laminar flame speeds of hydrogen/air, methane/air, and dimethyl ether (DME)/air mixtures. Then we demonstrate extrapolatory performance of the shared subspace: using a shared subspace trained on the ignition delays at constant volume, we perform forward propagation of kinetic uncertainties through zero-dimensional HCCI simulations—in particular, single-stage ignition of a natural gas/air mixture and two-stage ignition of a DME/air mixture. We show that the shared subspace can accurately reproduce the probability of ignition failure and the probability density of ignition crank angle conditioned on successful ignition, given uncertainty in the kinetics.

##### BibTeX entry

@article{art_52, author = "W. Ji and J. Wang and O. Zahm and Y. M. Marzouk and B. Yang and Z. Ren and C. K. Law", journal = "Combustion and Flame", pages = "146--157", title = "Shared low-dimensional subspaces for propagating kinetic uncertainty to multiple outputs", volume = "190", year = "2018", doi = "10.1016/j.combustflame.2017.11.021", abstract = "Forward propagation of kinetic uncertainty in combustion simulations usually adopts response surface techniques to accelerate Monte Carlo sampling. Yet it is computationally challenging to build response surfaces for high-dimensional input parameters and expensive combustion models. This study uses the active subspace method to identify a low-dimensional subspace of the input space, within which response surfaces can be built. Active subspace methods have previously been developed only for single (scalar) model outputs, however. This paper introduces a new method that can simultaneously approximate the marginal probability density functions of multiple outputs using a single low-dimensional shared subspace. We identify the shared subspace by solving a least-squares system to compute an appropriate combination of single-output active subspaces. Because the identification of the active subspace for each individual output may require a significant number of samples, this process may be computationally intractable for expensive models such as turbulent combustion simulations. Instead, we propose a heuristic approach that learns the relevant subspaces from cheaper combustion models. The performance of the active subspace for a single output, and of the shared subspace for multiple outputs, is first demonstrated with the ignition delay times and laminar flame speeds of hydrogen/air, methane/air, and dimethyl ether (DME)/air mixtures. Then we demonstrate extrapolatory performance of the shared subspace: using a shared subspace trained on the ignition delays at constant volume, we perform forward propagation of kinetic uncertainties through zero-dimensional HCCI simulations—in particular, single-stage ignition of a natural gas/air mixture and two-stage ignition of a DME/air mixture. We show that the shared subspace can accurately reproduce the probability of ignition failure and the probability density of ignition crank angle conditioned on successful ignition, given uncertainty in the kinetics.", keywords = "Uncertainty propagation, dimension reduction, active subspaces, shared subspaces, multiple outputs" }

*Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting,*Advances in Neural Information Processing Systems (NIPS), 30 (2017).

##### Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting

We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.

##### BibTeX entry

@article{art_51, author = "R. Morrison and R. Baptista and Y. M. Marzouk", journal = "Advances in Neural Information Processing Systems (NIPS)", title = "Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting", volume = "30", year = "2017", abstract = "We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another." }

*Experimentally testing the dependence of momentum transport on second derivatives using Gaussian process regression,*Nuclear Fusion, 57 (2017), pp. 126013.

##### Experimentally testing the dependence of momentum transport on second derivatives using Gaussian process regression

It remains an open question to explain the dramatic change in intrinsic rotation induced by slight changes in electron density (White et al. 2013, Phys. Plasmas 20, 056106). One proposed explanation is that momentum transport is sensitive to the second derivatives of the temperature and density profiles (Lee et al. 2015, Plasma Phys. Controlled Fusion 57, 125006), but it is widely considered to be impossible to measure these higher derivatives. In this paper, we show that it is possible to estimate second derivatives of electron density and temperature using a nonparametric regression technique known as Gaussian process regression (GPR). This technique avoids over-constraining the fit by not assuming an explicit functional form for the fitted curve. The uncertainties, obtained rigorously using Markov chain Monte Carlo (MCMC) sampling, are small enough that it is reasonable to explore hypotheses which depend on second derivatives. It is found that the differences in the second derivatives of $n_e$ and $T_e$ between the peaked and hollow rotation cases are rather small, suggesting that changes in the second derivatives are not likely to explain the experimental results.

##### BibTeX entry

@article{art_50, author = "M. Chilenski and M. Greenwald and A. Hubbard and J. Hughes and J. Lee and Y. M. Marzouk and J. Rice and A. White", journal = "Nuclear Fusion", month = "9", number = "12", pages = "126013", title = "Experimentally testing the dependence of momentum transport on second derivatives using Gaussian process regression", volume = "57", year = "2017", doi = "10.1088/1741-4326/aa8387", abstract = "It remains an open question to explain the dramatic change in intrinsic rotation induced by slight changes in electron density (White et al. 2013, Phys. Plasmas 20, 056106). One proposed explanation is that momentum transport is sensitive to the second derivatives of the temperature and density profiles (Lee et al. 2015, Plasma Phys. Controlled Fusion 57, 125006), but it is widely considered to be impossible to measure these higher derivatives. In this paper, we show that it is possible to estimate second derivatives of electron density and temperature using a nonparametric regression technique known as Gaussian process regression (GPR). This technique avoids over-constraining the fit by not assuming an explicit functional form for the fitted curve. The uncertainties, obtained rigorously using Markov chain Monte Carlo (MCMC) sampling, are small enough that it is reasonable to explore hypotheses which depend on second derivatives. It is found that the differences in the second derivatives of $n_e$ and $T_e$ between the peaked and hollow rotation cases are rather small, suggesting that changes in the second derivatives are not likely to explain the experimental results." }

*Goal-oriented optimal approximations of Bayesian linear inverse problems,*SIAM Journal on Scientific Computing, 39 (2017), pp. S167–S196.

##### Goal-oriented optimal approximations of Bayesian linear inverse problems

We propose optimal dimensionality reduction techniques for the solution of goal-oriented linear-Gaussian inverse problems, where the quantity of interest (QoI) is a function of the inversion parameters. These approximations are suitable for large-scale applications. In particular, we study the approximation of the posterior covariance of the QoI as a low-rank negative update of its prior covariance, and prove optimality of this update with respect to the natural geodesic distance on the manifold of symmetric positive definite matrices. Assuming exact knowledge of the posterior mean of the QoI, the optimality results extend to optimality in distribution with respect to the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose approximation of the posterior mean of the QoI as a low-rank linear function of the data, and prove optimality of this approximation with respect to a weighted Bayes risk. Both of these optimal approximations avoid the explicit computation of the full posterior distribution of the parameters and instead focus on directions that are well informed by the data and relevant to the QoI. These directions stem from a balance among all the components of the goal-oriented inverse problem: prior information, forward model, measurement noise, and ultimate goals. We illustrate the theory using a high-dimensional inverse problem in heat transfer.

##### BibTeX entry

@article{art_49, author = "A. Spantini and T. Cui and K. Willcox and L. Tenorio and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "5", pages = "S167--S196", title = "Goal-oriented optimal approximations of Bayesian linear inverse problems", volume = "39", year = "2017", doi = "10.1137/16M1082123", abstract = "We propose optimal dimensionality reduction techniques for the solution of goal-oriented linear-Gaussian inverse problems, where the quantity of interest (QoI) is a function of the inversion parameters. These approximations are suitable for large-scale applications. In particular, we study the approximation of the posterior covariance of the QoI as a low-rank negative update of its prior covariance, and prove optimality of this update with respect to the natural geodesic distance on the manifold of symmetric positive definite matrices. Assuming exact knowledge of the posterior mean of the QoI, the optimality results extend to optimality in distribution with respect to the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose approximation of the posterior mean of the QoI as a low-rank linear function of the data, and prove optimality of this approximation with respect to a weighted Bayes risk. Both of these optimal approximations avoid the explicit computation of the full posterior distribution of the parameters and instead focus on directions that are well informed by the data and relevant to the QoI. These directions stem from a balance among all the components of the goal-oriented inverse problem: prior information, forward model, measurement noise, and ultimate goals. We illustrate the theory using a high-dimensional inverse problem in heat transfer.", keywords = "inverse problems, goal--oriented, Bayesian inference, low-rank approximation, covariance approximation, Riemannian metric, geodesic distance, posterior mean approximation, Bayes risk, optimality" }

*Bayesian inverse problems with l1 priors: a randomize-then-optimize approach,*SIAM Journal on Scientific Computing, 39 (2017), pp. S140–S166.

##### Bayesian inverse problems with l1 priors: a randomize-then-optimize approach

Prior distributions for Bayesian inference that rely on the $l_1$-norm of the parameters are of considerable interest, in part because they promote parameter fields with less regularity than Gaussian priors (e.g., discontinuities and blockiness). These $l_1$-type priors include the total variation (TV) prior and the Besov space $B^s_{1,1}$ prior, and in general yield non-Gaussian posterior distributions. Sampling from these posteriors is challenging, particularly in the inverse problem setting where the parameter space is high-dimensional and the forward problem may be nonlinear. This paper extends the randomize-then-optimize (RTO) method, an optimization-based sampling algorithm developed for Bayesian inverse problems with Gaussian priors, to inverse problems with $l_1$-type priors. We use a variable transformation to convert an $l_1$-type prior to a standard Gaussian prior, such that the posterior distribution of the transformed parameters is amenable to Metropolized sampling via RTO. We demonstrate this approach on several deconvolution problems and an elliptic PDE inverse problem, using TV or Besov space $B^s_{1,1}$ priors. Our results show that the transformed RTO algorithm characterizes the correct posterior distribution and can be more efficient than other sampling algorithms. The variable transformation can also be extended to other non-Gaussian priors.

##### BibTeX entry

@article{art_48, author = "Z. Wang and J. M. Bardsley and A. Solonen and T. Cui and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "5", pages = "S140--S166", title = "Bayesian inverse problems with l1 priors: a randomize-then-optimize approach", volume = "39", year = "2017", doi = "10.1137/16M1080938", abstract = "Prior distributions for Bayesian inference that rely on the $l_1$-norm of the parameters are of considerable interest, in part because they promote parameter fields with less regularity than Gaussian priors (e.g., discontinuities and blockiness). These $l_1$-type priors include the total variation (TV) prior and the Besov space $B^s_{1,1}$ prior, and in general yield non-Gaussian posterior distributions. Sampling from these posteriors is challenging, particularly in the inverse problem setting where the parameter space is high-dimensional and the forward problem may be nonlinear. This paper extends the randomize-then-optimize (RTO) method, an optimization-based sampling algorithm developed for Bayesian inverse problems with Gaussian priors, to inverse problems with $l_1$-type priors. We use a variable transformation to convert an $l_1$-type prior to a standard Gaussian prior, such that the posterior distribution of the transformed parameters is amenable to Metropolized sampling via RTO. We demonstrate this approach on several deconvolution problems and an elliptic PDE inverse problem, using TV or Besov space $B^s_{1,1}$ priors. Our results show that the transformed RTO algorithm characterizes the correct posterior distribution and can be more efficient than other sampling algorithms. The variable transformation can also be extended to other non-Gaussian priors.", keywords = "Inverse problems, Bayesian inference, Monte Carlo methods" }

*A multiscale strategy for Bayesian inference using transport maps,*SIAM/ASA Journal on Uncertainty Quantification, 4 (2016), pp. 1160–1190.

##### A multiscale strategy for Bayesian inference using transport maps

In many inverse problems, model parameters cannot be precisely determined from observational data. Bayesian inference provides a mechanism for capturing the resulting parameter uncertainty, but typically at a high computational cost. This work introduces a multiscale decomposition that exploits conditional independence across scales, when present in certain classes of inverse problems, to decouple Bayesian inference into two stages: (1) a computationally tractable coarse-scale inference problem; and (2) a mapping of the low-dimensional coarse-scale posterior distribution into the original high-dimensional parameter space. This decomposition relies on a characterization of the non-Gaussian joint distribution of coarse- and fine-scale quantities via optimal transport maps. We demonstrate our approach on a sequence of inverse problems arising in subsurface flow, using the multiscale finite element method to discretize the steady state pressure equation. We compare the multiscale strategy with full-dimensional Markov chain Monte Carlo on a problem of moderate dimension (100 parameters) and then use it to infer a conductivity field described by over 10,000 parameters.

##### BibTeX entry

@article{parno-transport-juq-2016, author = "M. Parno and T. Moselhy and Y. M. Marzouk", journal = "SIAM/ASA Journal on Uncertainty Quantification", month = "10", number = "1", pages = "1160--1190", title = "A multiscale strategy for Bayesian inference using transport maps", volume = "4", year = "2016", doi = "10.1137/15M1032478", abstract = "In many inverse problems, model parameters cannot be precisely determined from observational data. Bayesian inference provides a mechanism for capturing the resulting parameter uncertainty, but typically at a high computational cost. This work introduces a multiscale decomposition that exploits conditional independence across scales, when present in certain classes of inverse problems, to decouple Bayesian inference into two stages: (1) a computationally tractable coarse-scale inference problem; and (2) a mapping of the low-dimensional coarse-scale posterior distribution into the original high-dimensional parameter space. This decomposition relies on a characterization of the non-Gaussian joint distribution of coarse- and fine-scale quantities via optimal transport maps. We demonstrate our approach on a sequence of inverse problems arising in subsurface flow, using the multiscale finite element method to discretize the steady state pressure equation. We compare the multiscale strategy with full-dimensional Markov chain Monte Carlo on a problem of moderate dimension (100 parameters) and then use it to infer a conductivity field described by over 10,000 parameters.", keywords = "Bayesian inference; inverse problems; multiscale modeling; multiscale finite element method; optimal transportation; Markov chain Monte Carlo" }

*Spectral tensor-train decomposition,*SIAM Journal on Scientific Computing, 38 (2016), pp. A2405–A2439.

##### Spectral tensor-train decomposition

The accurate approximation of high-dimensional functions is an essential task in uncertainty quantification and many other fields. We propose a new function approximation scheme based on a spectral extension of the tensor-train (TT) decomposition. We first define a functional version of the TT decomposition and analyze its properties. We obtain results on the convergence of the decomposition, revealing links between the regularity of the function, the dimension of the input space, and the TT ranks. We also show that the regularity of the target function is preserved by the univariate functions (i.e., the "cores") comprising the functional TT decomposition. This result motivates an approximation scheme employing polynomial approximations of the cores. For functions with appropriate regularity, the resulting \textit{spectral tensor-train decomposition} combines the favorable dimension-scaling of the TT decomposition with the spectral convergence rate of polynomial approximations, yielding efficient and accurate surrogates for high-dimensional functions. To construct these decompositions, we use the sampling algorithm \texttt{TT-DMRG-cross} to obtain the TT decomposition of tensors resulting from suitable discretizations of the target function. We assess the performance of the method on a range of numerical examples: a modifed set of Genz functions with dimension up to 100, and functions with mixed Fourier modes or with local features. We observe significant improvements in performance over an anisotropic adaptive Smolyak approach. The method is also used to approximate the solution of an elliptic PDE with random input data. The open source software and examples presented in this work are available online.

##### BibTeX entry

@article{art_45, author = "D. Bigoni and A. Engsig-Karup and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "4", pages = "A2405--A2439", title = "Spectral tensor-train decomposition", volume = "38", year = "2016", doi = "10.1137/15M1036919", abstract = "The accurate approximation of high-dimensional functions is an essential task in uncertainty quantification and many other fields. We propose a new function approximation scheme based on a spectral extension of the tensor-train (TT) decomposition. We first define a functional version of the TT decomposition and analyze its properties. We obtain results on the convergence of the decomposition, revealing links between the regularity of the function, the dimension of the input space, and the TT ranks. We also show that the regularity of the target function is preserved by the univariate functions (i.e., the "cores") comprising the functional TT decomposition. This result motivates an approximation scheme employing polynomial approximations of the cores. For functions with appropriate regularity, the resulting \textit{spectral tensor-train decomposition} combines the favorable dimension-scaling of the TT decomposition with the spectral convergence rate of polynomial approximations, yielding efficient and accurate surrogates for high-dimensional functions. To construct these decompositions, we use the sampling algorithm \texttt{TT-DMRG-cross} to obtain the TT decomposition of tensors resulting from suitable discretizations of the target function. We assess the performance of the method on a range of numerical examples: a modifed set of Genz functions with dimension up to 100, and functions with mixed Fourier modes or with local features. We observe significant improvements in performance over an anisotropic adaptive Smolyak approach. The method is also used to approximate the solution of an elliptic PDE with random input data. The open source software and examples presented in this work are available online.", keywords = "approximation theory; tensor-train decomposition; orthogonal polynomials; uncertainty quantification" }

*Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction,*Journal of Computational Physics, 315 (2016), pp. 363–387.

##### Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction

Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting–-both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.

##### BibTeX entry

@article{art_44, author = "T. Cui and Y. M. Marzouk and K. Willcox", journal = "Journal of Computational Physics", month = "6", pages = "363--387", title = "Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction", volume = "315", year = "2016", doi = "doi:10.1016/j.jcp.2016.03.055", abstract = "Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting---both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.", keywords = "Inverse problems, Bayesian inference, dimension reduction, model reduction, low-rank approximation, Markov chain Monte Carlo" }

*Mercer kernels and integrated variance experimental design: connections between Gaussian process regression and polynomial approximation,*SIAM/ASA Journal on Uncertainty Quantification, 4 (2016), pp. 796–828.

##### Mercer kernels and integrated variance experimental design: connections between Gaussian process regression and polynomial approximation

This paper examines experimental design procedures used to develop surrogates of computational models, exploring the interplay between experimental designs and approximation algorithms. We focus on two widely used approximation approaches, Gaussian process (GP) regression and nonintrusive polynomial approximation. First, we introduce algorithms for minimizing a posterior integrated variance (IVAR) design criterion for GP regression. Our formulation treats design as a continuous optimization problem that can be solved with gradient-based methods on complex input domains without resorting to greedy approximations. We show that minimizing IVAR in this way yields point sets with good interpolation properties and that it enables more accurate GP regression than designs based on entropy minimization or mutual information maximization. Second, using a Mercer kernel/eigenfunction perspective on GP regression, we identify conditions under which GP regression coincides with pseudospectral polynomial approximation. Departures from these conditions can be understood as changes either to the kernel or to the experimental design itself. We then show how IVAR-optimal designs, while sacrificing discrete orthogonality of the kernel eigenfunctions, can yield lower approximation error than orthogonalizing point sets. Finally, we compare the performance of adaptive GP regression and adaptive pseudospectral approximation for several classes of target functions, identifying features that are favorable to the GP + IVAR approach.

##### BibTeX entry

@article{art_43, author = "A. Gorodetsky and Y. M. Marzouk", journal = "SIAM/ASA Journal on Uncertainty Quantification", number = "1", pages = "796--828", title = "Mercer kernels and integrated variance experimental design: connections between Gaussian process regression and polynomial approximation", volume = "4", year = "2016", doi = "10.1137/15M1017119", abstract = "This paper examines experimental design procedures used to develop surrogates of computational models, exploring the interplay between experimental designs and approximation algorithms. We focus on two widely used approximation approaches, Gaussian process (GP) regression and nonintrusive polynomial approximation. First, we introduce algorithms for minimizing a posterior integrated variance (IVAR) design criterion for GP regression. Our formulation treats design as a continuous optimization problem that can be solved with gradient-based methods on complex input domains without resorting to greedy approximations. We show that minimizing IVAR in this way yields point sets with good interpolation properties and that it enables more accurate GP regression than designs based on entropy minimization or mutual information maximization. Second, using a Mercer kernel/eigenfunction perspective on GP regression, we identify conditions under which GP regression coincides with pseudospectral polynomial approximation. Departures from these conditions can be understood as changes either to the kernel or to the experimental design itself. We then show how IVAR-optimal designs, while sacrificing discrete orthogonality of the kernel eigenfunctions, can yield lower approximation error than orthogonalizing point sets. Finally, we compare the performance of adaptive GP regression and adaptive pseudospectral approximation for several classes of target functions, identifying features that are favorable to the GP + IVAR approach.", keywords = "Gaussian process regression; experimental design; computer experiments; approximation theory; polynomial approximation; kernel interpolation; uncertainty quantification" }

*On dimension reduction in Gaussian filters,*Inverse Problems, 32 (2016), pp. 045003.

##### On dimension reduction in Gaussian filters

A priori dimension reduction is a widely adopted technique for reducing the computational complexity of stationary inverse problems. In this setting, the solution of an inverse problem is parameterized by a low-dimensional basis that is often obtained from the truncated Karhunen-Loève expansion of the prior distribution. For high-dimensional inverse problems equipped with smoothing priors, this technique can lead to drastic reductions in parameter dimension and significant computational savings.

In this paper, we extend the concept of a priori dimension reduction to non-stationary inverse problems, in which the goal is to sequentially infer the state of a dynamical system. Our approach proceeds in an offline-online fashion. We first identify a low-dimensional subspace in the state space before solving the inverse problem (the offline phase), using either the method of "snapshots" or regularized covariance estimation. Then this subspace is used to reduce the computational complexity of various filtering algorithms—including the Kalman filter, extended Kalman filter, and ensemble Kalman filter—within a novel subspace-constrained Bayesian prediction-and-update procedure (the online phase). We demonstrate the performance of our new dimension reduction approach on various numerical examples. In some test cases, our approach reduces the dimensionality of the original problem by orders of magnitude and yields up to two orders of magnitude in computational savings.

In this paper, we extend the concept of a priori dimension reduction to non-stationary inverse problems, in which the goal is to sequentially infer the state of a dynamical system. Our approach proceeds in an offline-online fashion. We first identify a low-dimensional subspace in the state space before solving the inverse problem (the offline phase), using either the method of "snapshots" or regularized covariance estimation. Then this subspace is used to reduce the computational complexity of various filtering algorithms—including the Kalman filter, extended Kalman filter, and ensemble Kalman filter—within a novel subspace-constrained Bayesian prediction-and-update procedure (the online phase). We demonstrate the performance of our new dimension reduction approach on various numerical examples. In some test cases, our approach reduces the dimensionality of the original problem by orders of magnitude and yields up to two orders of magnitude in computational savings.

##### BibTeX entry

@article{art_42, author = "A. Solonen and T. Cui and J. Hakkarainen and Y. M. Marzouk", journal = "Inverse Problems", month = "3", number = "4", pages = "045003", title = "On dimension reduction in Gaussian filters", volume = "32", year = "2016", doi = "10.1088/0266-5611/32/4/045003", abstract = "A priori dimension reduction is a widely adopted technique for reducing the computational complexity of stationary inverse problems. In this setting, the solution of an inverse problem is parameterized by a low-dimensional basis that is often obtained from the truncated Karhunen-Loève expansion of the prior distribution. For high-dimensional inverse problems equipped with smoothing priors, this technique can lead to drastic reductions in parameter dimension and significant computational savings. In this paper, we extend the concept of a priori dimension reduction to non-stationary inverse problems, in which the goal is to sequentially infer the state of a dynamical system. Our approach proceeds in an offline-online fashion. We first identify a low-dimensional subspace in the state space before solving the inverse problem (the offline phase), using either the method of "snapshots" or regularized covariance estimation. Then this subspace is used to reduce the computational complexity of various filtering algorithms—including the Kalman filter, extended Kalman filter, and ensemble Kalman filter—within a novel subspace-constrained Bayesian prediction-and-update procedure (the online phase). We demonstrate the performance of our new dimension reduction approach on various numerical examples. In some test cases, our approach reduces the dimensionality of the original problem by orders of magnitude and yields up to two orders of magnitude in computational savings." }

*Multifidelity importance sampling,*Computer Methods in Applied Mechanics and Engineering, 300 (2016), pp. 490–509.

##### Multifidelity importance sampling

Estimating statistics of model outputs with the Monte Carlo method often requires a large number of model evaluations. This leads to long runtimes if the model is expensive to evaluate. Importance sampling is one approach that can lead to a reduction in the number of model evaluations. Importance sampling uses a biasing distribution to sample the model more efficiently, but generating such a biasing distribution can be difficult and usually also requires model evaluations. A different strategy to speed up Monte Carlo sampling is to replace the computationally expensive high-fidelity model with a computationally cheap surrogate model; however, because the surrogate model outputs are only approximations of the high-fidelity model outputs, the estimate obtained using a surrogate model is in general biased with respect to the estimate obtained using the high-fidelity model. We introduce a multifidelity importance sampling (MFIS) method, which combines evaluations of both the high-fidelity and a surrogate model. It uses a surrogate model to facilitate the construction of the biasing distribution, but relies on a small number of evaluations of the high-fidelity model to derive an unbiased estimate of the statistics of interest. We prove that the MFIS estimate is unbiased even in the absence of accuracy guarantees on the surrogate model itself. The MFIS method can be used with any type of surrogate model, such as projection-based reduced-order models and data-fit models. Furthermore, the MFIS method is applicable to black-box models, i.e., where only inputs and the corresponding outputs of the high-fidelity and the surrogate model are available but not the details of the models themselves. We demonstrate on nonlinear and time-dependent problems that our MFIS method achieves speedups of up to several orders of magnitude compared to Monte Carlo with importance sampling that uses the high-fidelity model only.

##### BibTeX entry

@article{art_41, author = "B. Peherstorfer and T. Cui and Y. M. Marzouk and K. Willcox", journal = "Computer Methods in Applied Mechanics and Engineering", pages = "490--509", title = "Multifidelity importance sampling", volume = "300", year = "2016", doi = "http://dx.doi.org/10.1016/j.cma.2015.12.002", abstract = "Estimating statistics of model outputs with the Monte Carlo method often requires a large number of model evaluations. This leads to long runtimes if the model is expensive to evaluate. Importance sampling is one approach that can lead to a reduction in the number of model evaluations. Importance sampling uses a biasing distribution to sample the model more efficiently, but generating such a biasing distribution can be difficult and usually also requires model evaluations. A different strategy to speed up Monte Carlo sampling is to replace the computationally expensive high-fidelity model with a computationally cheap surrogate model; however, because the surrogate model outputs are only approximations of the high-fidelity model outputs, the estimate obtained using a surrogate model is in general biased with respect to the estimate obtained using the high-fidelity model. We introduce a multifidelity importance sampling (MFIS) method, which combines evaluations of both the high-fidelity and a surrogate model. It uses a surrogate model to facilitate the construction of the biasing distribution, but relies on a small number of evaluations of the high-fidelity model to derive an unbiased estimate of the statistics of interest. We prove that the MFIS estimate is unbiased even in the absence of accuracy guarantees on the surrogate model itself. The MFIS method can be used with any type of surrogate model, such as projection-based reduced-order models and data-fit models. Furthermore, the MFIS method is applicable to black-box models, i.e., where only inputs and the corresponding outputs of the high-fidelity and the surrogate model are available but not the details of the models themselves. We demonstrate on nonlinear and time-dependent problems that our MFIS method achieves speedups of up to several orders of magnitude compared to Monte Carlo with importance sampling that uses the high-fidelity model only.", keywords = "Monte Carlo; importance sampling; surrogate modeling; multifidelity methods" }

*Dimension-independent likelihood-informed MCMC,*Journal of Computational Physics, 304 (2016), pp. 109–137.

##### Dimension-independent likelihood-informed MCMC

Many Bayesian inference problems require exploring the posterior distribution of high-dimensional parameters that represent the discretization of an underlying function. This work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. Two distinct lines of research intersect in the methods developed here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian information and any associated low-dimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent, likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Two nonlinear inverse problems are used to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.

##### BibTeX entry

@article{art_38, author = "T. Cui and K. J. H. Law and Y. M. Marzouk", journal = "Journal of Computational Physics", month = "1", pages = "109--137", title = "Dimension-independent likelihood-informed MCMC", volume = "304", year = "2016", doi = "doi:10.1016/j.jcp.2015.10.008", abstract = "Many Bayesian inference problems require exploring the posterior distribution of high-dimensional parameters that represent the discretization of an underlying function. This work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. Two distinct lines of research intersect in the methods developed here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian information and any associated low-dimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent, likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Two nonlinear inverse problems are used to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.", keywords = "Markov chain Monte Carlo, likelihood-informed subspace, infinite-dimensional inverse problems, Langevin SDE, conditioned diffusion" }

*Accelerating asymptotically exact MCMC for computationally intensive models via local approximations,*Journal of the American Statistical Association, 111 (2016), pp. 1591–1607.

##### Accelerating asymptotically exact MCMC for computationally intensive models via local approximations

We construct a new framework for accelerating Markov chain Monte Carlo in posterior sampling problems where standard methods are limited by the computational cost of the likelihood, or of numerical models embedded therein. Our approach introduces local approximations of these models into the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Previous efforts at integrating approximate models into inference typically sacrifice either the sampler’s exactness or efficiency; our work seeks to address these limitations by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain, showing that it samples asymptotically from the \emph{exact} posterior distribution of interest. We describe variations of the algorithm that employ either local polynomial approximations or local Gaussian process regressors. Our theoretical results reinforce the key observation underlying this paper: when the likelihood has some \emph{local} regularity, the number of model evaluations per MCMC step can be greatly reduced without biasing the Monte Carlo average. Numerical experiments demonstrate multiple order-of-magnitude reductions in the number of forward model evaluations used in representative ODE and PDE inference problems, with both synthetic and real data.

##### BibTeX entry

@article{art_36, author = "P. Conrad and Y. M. Marzouk and N. Pillai and A. Smith", journal = "Journal of the American Statistical Association", number = "516", pages = "1591--1607", title = "Accelerating asymptotically exact MCMC for computationally intensive models via local approximations", volume = "111", year = "2016", doi = "10.1080/01621459.2015.1096787", abstract = "We construct a new framework for accelerating Markov chain Monte Carlo in posterior sampling problems where standard methods are limited by the computational cost of the likelihood, or of numerical models embedded therein. Our approach introduces local approximations of these models into the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Previous efforts at integrating approximate models into inference typically sacrifice either the sampler's exactness or efficiency; our work seeks to address these limitations by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain, showing that it samples asymptotically from the \emph{exact} posterior distribution of interest. We describe variations of the algorithm that employ either local polynomial approximations or local Gaussian process regressors. Our theoretical results reinforce the key observation underlying this paper: when the likelihood has some \emph{local} regularity, the number of model evaluations per MCMC step can be greatly reduced without biasing the Monte Carlo average. Numerical experiments demonstrate multiple order-of-magnitude reductions in the number of forward model evaluations used in representative ODE and PDE inference problems, with both synthetic and real data.", keywords = "Markov chain Monte Carlo, experimental design, approximation theory, local approximation, computer experiments, emulators" }

*Optimal low-rank approximations of Bayesian linear inverse problems,*SIAM Journal on Scientific Computing, 37 (2015), pp. A2451–A2487.

##### Optimal low-rank approximations of Bayesian linear inverse problems

In the Bayesian approach to inverse problems, data are often informative, relative to the prior, only on a low-dimensional subspace of the parameter space. Significant computational savings can be achieved by using this subspace to characterize and approximate the posterior distribution of the parameters. We first investigate approximation of the posterior covariance matrix as a low-rank update of the prior covariance matrix. We prove optimality of a particular update, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision, for a broad class of loss functions. This class includes the F\"{o}rstner metric for symmetric positive definite matrices, as well as the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose two fast approximations of the posterior mean and prove their optimality with respect to a weighted Bayes risk under squared-error loss. These approximations are deployed in an offline-online manner, where a more costly but data-independent offline calculation is followed by fast online evaluations. As a result, these approximations are particularly useful when repeated posterior mean evaluations are required for multiple data sets. We demonstrate our theoretical results with several numerical examples, including high-dimensional X-ray tomography and an inverse heat conduction problem. In both of these examples, the intrinsic low-dimensional structure of the inference problem can be exploited while producing results that are essentially indistinguishable from solutions computed in the full space.

##### BibTeX entry

@article{art_35, author = "A. Spantini and A. Solonen and T. Cui and J. Martin and L. Tenorio and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "6", pages = "A2451--A2487", title = "Optimal low-rank approximations of Bayesian linear inverse problems", volume = "37", year = "2015", doi = "10.1137/140977308", abstract = "In the Bayesian approach to inverse problems, data are often informative, relative to the prior, only on a low-dimensional subspace of the parameter space. Significant computational savings can be achieved by using this subspace to characterize and approximate the posterior distribution of the parameters. We first investigate approximation of the posterior covariance matrix as a low-rank update of the prior covariance matrix. We prove optimality of a particular update, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision, for a broad class of loss functions. This class includes the F\"{o}rstner metric for symmetric positive definite matrices, as well as the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose two fast approximations of the posterior mean and prove their optimality with respect to a weighted Bayes risk under squared-error loss. These approximations are deployed in an offline-online manner, where a more costly but data-independent offline calculation is followed by fast online evaluations. As a result, these approximations are particularly useful when repeated posterior mean evaluations are required for multiple data sets. We demonstrate our theoretical results with several numerical examples, including high-dimensional X-ray tomography and an inverse heat conduction problem. In both of these examples, the intrinsic low-dimensional structure of the inference problem can be exploited while producing results that are essentially indistinguishable from solutions computed in the full space.", keywords = "inverse problems, Bayesian inference, low-rank approximation, covariance approximation, Förstner-Moonen metric, posterior mean approximation, Bayes risk, optimality" }

*Uncertainty quantification of silicon photonic devices with correlated and non-Gaussian random parameters,*Optics Express, 23 (2015), pp. 4242–4254.

##### Uncertainty quantification of silicon photonic devices with correlated and non-Gaussian random parameters

Process variations can significantly degrade device performance and chip yield in silicon photonics. In order to reduce the design and production costs, it is highly desirable to predict the statistical behavior of a device before the final fabrication. Monte Carlo is the mainstream computational technique used to estimate the uncertainties caused by process variations. However, it is very often too expensive due to its slow convergence rate. Recently, stochastic spectral methods based on polynomial chaos expansions have emerged as a promising alternative, and they have shown significant speedup over Monte Carlo in many engineering problems. The existing literature mostly assumes that the random parameters are mutually independent. However, in practical applications such assumption may not be necessarily accurate. In this paper, we develop an efficient numerical technique based on stochastic collocation to simulate silicon photonics with correlated and non-Gaussian random parameters. The effectiveness of our proposed technique is demonstrated by the simulation results of a silicon-on-insulator based directional coupler example. Since the mathematic formulation in this paper is very generic, our proposed algorithm can be applied to a large class of photonic design cases as well as to many other engineering problems.

##### BibTeX entry

@article{art_34, author = "T. Weng and Z. Zhang and Z. Su and Y. M. Marzouk and A. Melloni and L. Daniel", journal = "Optics Express", number = "4", pages = "4242--4254", title = "Uncertainty quantification of silicon photonic devices with correlated and non-Gaussian random parameters", volume = "23", year = "2015", doi = "10.1364/OE.23.004242", abstract = "Process variations can significantly degrade device performance and chip yield in silicon photonics. In order to reduce the design and production costs, it is highly desirable to predict the statistical behavior of a device before the final fabrication. Monte Carlo is the mainstream computational technique used to estimate the uncertainties caused by process variations. However, it is very often too expensive due to its slow convergence rate. Recently, stochastic spectral methods based on polynomial chaos expansions have emerged as a promising alternative, and they have shown significant speedup over Monte Carlo in many engineering problems. The existing literature mostly assumes that the random parameters are mutually independent. However, in practical applications such assumption may not be necessarily accurate. In this paper, we develop an efficient numerical technique based on stochastic collocation to simulate silicon photonics with correlated and non-Gaussian random parameters. The effectiveness of our proposed technique is demonstrated by the simulation results of a silicon-on-insulator based directional coupler example. Since the mathematic formulation in this paper is very generic, our proposed algorithm can be applied to a large class of photonic design cases as well as to many other engineering problems." }

*A new network approach to Bayesian inference in partial differential equations,*International Journal for Numerical Methods in Engineering, 104 (2015), pp. 313–329.

##### A new network approach to Bayesian inference in partial differential equations

We introduce a novel numerical approach to Bayesian parameter estimation in partial differential equations. The main idea is to translate the equation into a state-discrete dynamic Bayesian network with the discretization of cellular probabilistic automata. There exists a vast pool of inference algorithms in the probabilistic graphical models framework which can be applied to the network. In particular, we reformulate the parameter estimation problem as a filtering problem, discuss requirements for our specific setup, and apply the Boyen-Koller algorithm. To demonstrate our ideas, the scheme is applied to the problem of arsenate advection and adsorption in a water pipe: from measurements of the concentration of dissolved arsenate at the outflow boundary condition, we infer the strength of an arsenate source at the inflow boundary condition.

##### BibTeX entry

@article{art_33, author = "D. Kohler and Y. M. Marzouk and J. Müller and U. Wever", journal = "International Journal for Numerical Methods in Engineering", month = "11", number = "5", pages = "313--329", title = "A new network approach to Bayesian inference in partial differential equations", volume = "104", year = "2015", doi = "10.1002/nme.4928", abstract = "We introduce a novel numerical approach to Bayesian parameter estimation in partial differential equations. The main idea is to translate the equation into a state-discrete dynamic Bayesian network with the discretization of cellular probabilistic automata. There exists a vast pool of inference algorithms in the probabilistic graphical models framework which can be applied to the network. In particular, we reformulate the parameter estimation problem as a filtering problem, discuss requirements for our specific setup, and apply the Boyen-Koller algorithm. To demonstrate our ideas, the scheme is applied to the problem of arsenate advection and adsorption in a water pipe: from measurements of the concentration of dissolved arsenate at the outflow boundary condition, we infer the strength of an arsenate source at the inflow boundary condition.", keywords = "partial differential equations, dynamic Bayesian networks, Boyen--Koller algorithm, cellular probabilistic automata" }

*Improved profile fitting and quantification of uncertainty in experimental measurements of impurity transport coefficients using Gaussian process regression,*Nuclear Fusion, 55 (2015), pp. 023012.

##### Improved profile fitting and quantification of uncertainty in experimental measurements of impurity transport coefficients using Gaussian process regression

The need to fit smooth temperature and density profiles to discrete observations is ubiquitous in plasma physics, but the prevailing techniques for this have many shortcomings that cast doubt on the statistical validity of the results. This issue is amplified in the context of validation of gyrokinetic transport models (Holland et al. 2009, Phys. Plasmas 16, 052301), where the strong sensitivity of the code outputs to input gradients means that inadequacies in the profile fitting technique can easily lead to an incorrect assessment of the degree of agreement with experimental measurements. In order to rectify the shortcomings of standard approaches to profile fitting, we have applied Gaussian process regression (GPR), a powerful nonparametric regression technique, to analyze an Alcator C-Mod L-mode discharge used for past gyrokinetic validation work (Howard et al. 2012, Nucl. Fusion 52, 063002). We show that the GPR techniques can reproduce the previous results while delivering more statistically rigorous fits and uncertainty estimates for both the value and the gradient of plasma profiles with an improved level of automation. We also discuss how the use of GPR can allow for dramatic increases in the rate of convergence of uncertainty propagation for any code that takes experimental profiles as inputs. The new GPR techniques for profile fitting and uncertainty propagation are quite useful and general, and we describe the steps to implementation in detail in this paper. These techniques have the potential to substantially improve the quality of uncertainty estimates on profile fits and the rate of convergence of uncertainty propagation, making them of great interest for wider use in fusion experiments and modeling efforts.

##### BibTeX entry

@article{art_32, author = "M. A. Chilenski and M. Greenwald and Y. M. Marzouk and N. T. Howard and A. E. White and J. E. Rice and J. R. Walk", journal = "Nuclear Fusion", number = "2", pages = "023012", title = "Improved profile fitting and quantification of uncertainty in experimental measurements of impurity transport coefficients using Gaussian process regression", volume = "55", year = "2015", doi = "10.1088/0029-5515/55/2/023012", abstract = "The need to fit smooth temperature and density profiles to discrete observations is ubiquitous in plasma physics, but the prevailing techniques for this have many shortcomings that cast doubt on the statistical validity of the results. This issue is amplified in the context of validation of gyrokinetic transport models (Holland et al. 2009, Phys. Plasmas 16, 052301), where the strong sensitivity of the code outputs to input gradients means that inadequacies in the profile fitting technique can easily lead to an incorrect assessment of the degree of agreement with experimental measurements. In order to rectify the shortcomings of standard approaches to profile fitting, we have applied Gaussian process regression (GPR), a powerful nonparametric regression technique, to analyze an Alcator C-Mod L-mode discharge used for past gyrokinetic validation work (Howard et al. 2012, Nucl. Fusion 52, 063002). We show that the GPR techniques can reproduce the previous results while delivering more statistically rigorous fits and uncertainty estimates for both the value and the gradient of plasma profiles with an improved level of automation. We also discuss how the use of GPR can allow for dramatic increases in the rate of convergence of uncertainty propagation for any code that takes experimental profiles as inputs. The new GPR techniques for profile fitting and uncertainty propagation are quite useful and general, and we describe the steps to implementation in detail in this paper. These techniques have the potential to substantially improve the quality of uncertainty estimates on profile fits and the rate of convergence of uncertainty propagation, making them of great interest for wider use in fusion experiments and modeling efforts.", keywords = "plasma physics, Gaussian process regression, uncertainty propagation, model validation" }

*Bayesian inference of substrate properties from film behavior,*Modelling and Simulation in Materials Science and Engineering, 23 (2015), pp. 015009.

##### Bayesian inference of substrate properties from film behavior

We demonstrate that, by observing the behavior of a film deposited on a substrate, certain features of the substrate may be inferred with quantified uncertainty using Bayesian methods. We carry out this demonstration on an illustrative film/substrate model, where the substrate is a Gaussian random field and the film is a two-component mixture that obeys the Cahn-Hilliard equation. We construct a stochastic reduced order model to describe the film/substrate interaction and use it to infer substrate properties from film behavior. This quantitative inference strategy may be adapted to other film/substrate systems.

##### BibTeX entry

@article{art_31, author = "R. Aggarwal and M. Demkowicz and Y. M. Marzouk", journal = "Modelling and Simulation in Materials Science and Engineering", pages = "015009", title = "Bayesian inference of substrate properties from film behavior", volume = "23", year = "2015", doi = "http://dx.doi.org/10.1088/0965-0393/23/1/015009", abstract = "We demonstrate that, by observing the behavior of a film deposited on a substrate, certain features of the substrate may be inferred with quantified uncertainty using Bayesian methods. We carry out this demonstration on an illustrative film/substrate model, where the substrate is a Gaussian random field and the film is a two-component mixture that obeys the Cahn-Hilliard equation. We construct a stochastic reduced order model to describe the film/substrate interaction and use it to infer substrate properties from film behavior. This quantitative inference strategy may be adapted to other film/substrate systems." }

*Bayesian inference of chemical kinetic models from proposed reactions,*Chemical Engineering Science, 123 (2015), pp. 170–190.

##### Bayesian inference of chemical kinetic models from proposed reactions

Bayesian inference provides a natural framework for combining experimental data with prior knowledge to develop chemical kinetic models and quantify the associated uncertainties, not only in parameter values but also in model structure. Most existing applications of Bayesian model selection methods to chemical kinetics have been limited to comparisons among a small set of models, however. The significant computational cost of evaluating posterior model probabilities renders traditional Bayesian methods infeasible when the model space becomes large. We present a new framework for tractable Bayesian model inference and uncertainty quantification using a large number of systematically generated model hypotheses. The approach involves imposing point-mass mixture priors over rate constants and exploring the resulting posterior distribution using an adaptive Markov chain Monte Carlo method. The posterior samples are used to identify plausible models, to quantify rate constant uncertainties, and to extract key diagnostic information about model structure–-such as the reactions and operating pathways most strongly supported by the data. We provide numerical demonstrations of the proposed framework by inferring kinetic models for catalytic steam and dry reforming of methane using available experimental data.

##### BibTeX entry

@article{art_30, author = "N. Galagali and Y. M. Marzouk", journal = "Chemical Engineering Science", month = "2", pages = "170--190", title = "Bayesian inference of chemical kinetic models from proposed reactions", volume = "123", year = "2015", doi = "doi:10.1016/j.ces.2014.10.030", abstract = "Bayesian inference provides a natural framework for combining experimental data with prior knowledge to develop chemical kinetic models and quantify the associated uncertainties, not only in parameter values but also in model structure. Most existing applications of Bayesian model selection methods to chemical kinetics have been limited to comparisons among a small set of models, however. The significant computational cost of evaluating posterior model probabilities renders traditional Bayesian methods infeasible when the model space becomes large. We present a new framework for tractable Bayesian model inference and uncertainty quantification using a large number of systematically generated model hypotheses. The approach involves imposing point-mass mixture priors over rate constants and exploring the resulting posterior distribution using an adaptive Markov chain Monte Carlo method. The posterior samples are used to identify plausible models, to quantify rate constant uncertainties, and to extract key diagnostic information about model structure---such as the reactions and operating pathways most strongly supported by the data. We provide numerical demonstrations of the proposed framework by inferring kinetic models for catalytic steam and dry reforming of methane using available experimental data.", keywords = "Bayesian inference, chemical kinetics, model selection, Markov chain Monte Carlo, adaptive MCMC, online expectation maximization" }

*Data-driven model reduction for the Bayesian solution of inverse problems,*International Journal for Numerical Methods in Engineering, 102 (2015), pp. 966–990.

##### Data-driven model reduction for the Bayesian solution of inverse problems

One of the major challenges in the Bayesian solution of inverse problems governed by partial differential equations (PDEs) is the computational cost of repeatedly evaluating numerical PDE models, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. This paper proposes a data-driven projection-based model reduction technique to reduce this computational cost. The proposed technique has two distinctive features. First, the model reduction strategy is tailored to inverse problems: the snapshots used to construct the reduced-order model are computed adaptively from the posterior distribution. Posterior exploration and model reduction are thus pursued simultaneously. Second, to avoid repeated evaluations of the full-scale numerical model as in a standard MCMC method, we couple the full-scale model and the reduced-order model together in the MCMC algorithm. This maintains accurate inference while reducing its overall computational cost. In numerical experiments considering steady-state flow in a porous medium, the data-driven reduced-order model achieves better accuracy than a reduced-order model constructed using the classical approach. It also improves posterior sampling efficiency by several orders of magnitude compared to a standard MCMC method.

##### BibTeX entry

@article{art_29, author = "T. Cui and Y. M. Marzouk and K. Willcox", journal = "International Journal for Numerical Methods in Engineering", number = "5", pages = "966--990", title = "Data-driven model reduction for the Bayesian solution of inverse problems", volume = "102", year = "2015", doi = "doi:10.1002/nme.4748", abstract = "One of the major challenges in the Bayesian solution of inverse problems governed by partial differential equations (PDEs) is the computational cost of repeatedly evaluating numerical PDE models, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. This paper proposes a data-driven projection-based model reduction technique to reduce this computational cost. The proposed technique has two distinctive features. First, the model reduction strategy is tailored to inverse problems: the snapshots used to construct the reduced-order model are computed adaptively from the posterior distribution. Posterior exploration and model reduction are thus pursued simultaneously. Second, to avoid repeated evaluations of the full-scale numerical model as in a standard MCMC method, we couple the full-scale model and the reduced-order model together in the MCMC algorithm. This maintains accurate inference while reducing its overall computational cost. In numerical experiments considering steady-state flow in a porous medium, the data-driven reduced-order model achieves better accuracy than a reduced-order model constructed using the classical approach. It also improves posterior sampling efficiency by several orders of magnitude compared to a standard MCMC method.", keywords = "model reduction; inverse problem; adaptive Markov chain Monte Carlo; approximate Bayesian inference" }

*Likelihood-informed dimension reduction for nonlinear inverse problems,*Inverse Problems, 29 (2014), pp. 114015.

##### Likelihood-informed dimension reduction for nonlinear inverse problems

The intrinsic dimensionality of an inverse problem is affected by prior information, the accuracy and number of observations, and the smoothing properties of the forward operator. From a Bayesian perspective, changes from the prior to the posterior may, in many problems, be confined to a relatively low-dimensional subspace of the parameter space. We present a dimension reduction approach that defines and identifies such a subspace, called the "likelihood-informed subspace" (LIS), by characterizing the relative influences of the prior and the likelihood over the support of the posterior distribution. This identification enables new and more efficient computational methods for Bayesian inference with nonlinear forward models and Gaussian priors. In particular, we approximate the posterior distribution as the product of a lower-dimensional posterior defined on the LIS and the prior distribution marginalized onto the complementary subspace. Markov chain Monte Carlo sampling can then proceed in lower dimensions, with significant gains in computational efficiency. We also introduce a Rao-Blackwellization strategy that de-randomizes Monte Carlo estimates of posterior expectations for additional variance reduction. We demonstrate the efficiency of our methods using two numerical examples: inference of permeability in a groundwater system governed by an elliptic PDE, and an atmospheric remote sensing problem based on Global Ozone Monitoring System (GOMOS) observations.

##### BibTeX entry

@article{art_28, author = "T. Cui and J. Martin and Y. M. Marzouk and A. Solonen and A. Spantini", journal = "Inverse Problems", pages = "114015", title = "Likelihood-informed dimension reduction for nonlinear inverse problems", volume = "29", year = "2014", doi = "doi:10.1088/0266-5611/30/11/114015", abstract = "The intrinsic dimensionality of an inverse problem is affected by prior information, the accuracy and number of observations, and the smoothing properties of the forward operator. From a Bayesian perspective, changes from the prior to the posterior may, in many problems, be confined to a relatively low-dimensional subspace of the parameter space. We present a dimension reduction approach that defines and identifies such a subspace, called the "likelihood-informed subspace" (LIS), by characterizing the relative influences of the prior and the likelihood over the support of the posterior distribution. This identification enables new and more efficient computational methods for Bayesian inference with nonlinear forward models and Gaussian priors. In particular, we approximate the posterior distribution as the product of a lower-dimensional posterior defined on the LIS and the prior distribution marginalized onto the complementary subspace. Markov chain Monte Carlo sampling can then proceed in lower dimensions, with significant gains in computational efficiency. We also introduce a Rao-Blackwellization strategy that de-randomizes Monte Carlo estimates of posterior expectations for additional variance reduction. We demonstrate the efficiency of our methods using two numerical examples: inference of permeability in a groundwater system governed by an elliptic PDE, and an atmospheric remote sensing problem based on Global Ozone Monitoring System (GOMOS) observations.", keywords = "Inverse problem, Bayesian inference, dimension reduction, low-rank approximation, Markov chain Monte Carlo, variance reduction" }

*Efficient localization of discontinuities in complex computational simulations,*SIAM Journal on Scientific Computing, 36 (2014), pp. A2584–A2610.

##### Efficient localization of discontinuities in complex computational simulations

Surrogate models for computational simulations are input-output approximations that allow computationally intensive analyses, such as uncertainty propagation and inference, to be performed efficiently. When a simulation output does not depend smoothly on its inputs, the error and convergence rate of many approximation methods deteriorate substantially. This paper details a method for efficiently localizing discontinuities in the input parameter domain, so that the model output can be approximated as a piecewise smooth function. The approach comprises an initialization phase, which uses polynomial annihilation to assign function values to different regions and thus seed an automated labeling procedure, followed by a refinement phase that adaptively updates a kernel support vector machine representation of the separating surface via active learning. The overall approach avoids structured grids and exploits any available simplicity in the geometry of the separating surface, thus reducing the number of model evaluations required to localize the discontinuity. The method is illustrated on examples of up to eleven dimensions, including algebraic models and ODE/PDE systems, and demonstrates improved scaling and efficiency over other discontinuity localization approaches.

##### BibTeX entry

@article{art_27, author = "A. Gorodetsky and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "6", pages = "A2584--A2610", title = "Efficient localization of discontinuities in complex computational simulations", volume = "36", year = "2014", abstract = "Surrogate models for computational simulations are input-output approximations that allow computationally intensive analyses, such as uncertainty propagation and inference, to be performed efficiently. When a simulation output does not depend smoothly on its inputs, the error and convergence rate of many approximation methods deteriorate substantially. This paper details a method for efficiently localizing discontinuities in the input parameter domain, so that the model output can be approximated as a piecewise smooth function. The approach comprises an initialization phase, which uses polynomial annihilation to assign function values to different regions and thus seed an automated labeling procedure, followed by a refinement phase that adaptively updates a kernel support vector machine representation of the separating surface via active learning. The overall approach avoids structured grids and exploits any available simplicity in the geometry of the separating surface, thus reducing the number of model evaluations required to localize the discontinuity. The method is illustrated on examples of up to eleven dimensions, including algebraic models and ODE/PDE systems, and demonstrates improved scaling and efficiency over other discontinuity localization approaches.", keywords = "discontinuity detection, polynomial annihilation, function approximation, support vector machines, active learning, uncertainty quantification" }

*Adaptive construction of surrogates for the Bayesian solution of inverse problems,*SIAM Journal on Scientific Computing, 36 (2014), pp. A1163–A1186.

##### Adaptive construction of surrogates for the Bayesian solution of inverse problems

The Bayesian approach to inverse problems typically relies on posterior sampling approaches, such as Markov chain Monte Carlo, for which the generation of each sample requires one or more evaluations of the parameter-to-observable map or forward model. When these evaluations are computationally intensive, approximations of the forward model are essential to accelerating sample-based inference. Yet the construction of globally accurate approximations for nonlinear forward models can be computationally prohibitive and in fact unnecessary, as the posterior distribution typically concentrates on a small fraction of the support of the prior distribution. We present a new approach that uses stochastic optimization to construct polynomial approximations over a sequence of measures adaptively determined from the data, eventually concentrating on the posterior distribution. The approach yields substantial gains in efficiency and accuracy over prior-based surrogates, as demonstrated via application to inverse problems in partial differential equations.

##### BibTeX entry

@article{art_26, author = "J. Li and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "3", pages = "A1163--A1186", title = "Adaptive construction of surrogates for the Bayesian solution of inverse problems", volume = "36", year = "2014", abstract = "The Bayesian approach to inverse problems typically relies on posterior sampling approaches, such as Markov chain Monte Carlo, for which the generation of each sample requires one or more evaluations of the parameter-to-observable map or forward model. When these evaluations are computationally intensive, approximations of the forward model are essential to accelerating sample-based inference. Yet the construction of globally accurate approximations for nonlinear forward models can be computationally prohibitive and in fact unnecessary, as the posterior distribution typically concentrates on a small fraction of the support of the prior distribution. We present a new approach that uses stochastic optimization to construct polynomial approximations over a sequence of measures adaptively determined from the data, eventually concentrating on the posterior distribution. The approach yields substantial gains in efficiency and accuracy over prior-based surrogates, as demonstrated via application to inverse problems in partial differential equations.", keywords = "Bayesian inference, cross-entropy method, importance sampling, inverse problem, Kullback-Leibler divergence, Markov chain Monte Carlo, polynomial chaos" }

*Gradient-based stochastic optimization methods in Bayesian experimental design,*International Journal for Uncertainty Quantification, 4 (2014), pp. 479–510.

##### Gradient-based stochastic optimization methods in Bayesian experimental design

Optimal experimental design (OED) seeks experiments expected to yield the most useful data for some purpose. In practical circumstances where experiments are time-consuming or resource-intensive, OED can yield enormous savings. We pursue OED for nonlinear systems from a Bayesian perspective, with the goal of choosing experiments that are optimal for parameter inference. Our objective in this context is the expected information gain in model parameters, which in general can only be estimated using Monte Carlo methods. Maximizing this objective thus becomes a stochastic optimization problem.

This paper develops gradient-based stochastic optimization methods for the design of experiments on a continuous parameter space. Given a Monte Carlo estimator of expected information gain, we use infinitesimal perturbation analysis to derive gradients of this estimator. We are then able to formulate two gradient-based stochastic optimization approaches: (i) Robbins-Monro stochastic approximation, and (ii) sample average approximation combined with a deterministic quasi-Newton method. A polynomial chaos approximation of the forward model accelerates objective and gradient evaluations in both cases. We discuss the implementation of these optimization methods, then conduct an empirical comparison of their performance. To demonstrate design in a nonlinear setting with partial differential equation forward models, we use the problem of sensor placement for source inversion. Numerical results yield useful guidelines on the choice of algorithm and sample sizes, assess the impact of estimator bias, and quantify tradeoffs of computational cost versus solution quality and robustness.

This paper develops gradient-based stochastic optimization methods for the design of experiments on a continuous parameter space. Given a Monte Carlo estimator of expected information gain, we use infinitesimal perturbation analysis to derive gradients of this estimator. We are then able to formulate two gradient-based stochastic optimization approaches: (i) Robbins-Monro stochastic approximation, and (ii) sample average approximation combined with a deterministic quasi-Newton method. A polynomial chaos approximation of the forward model accelerates objective and gradient evaluations in both cases. We discuss the implementation of these optimization methods, then conduct an empirical comparison of their performance. To demonstrate design in a nonlinear setting with partial differential equation forward models, we use the problem of sensor placement for source inversion. Numerical results yield useful guidelines on the choice of algorithm and sample sizes, assess the impact of estimator bias, and quantify tradeoffs of computational cost versus solution quality and robustness.

##### BibTeX entry

@article{art_25, author = "X. Huan and Y. M. Marzouk", journal = "International Journal for Uncertainty Quantification", number = "6", pages = "479--510", title = "Gradient-based stochastic optimization methods in Bayesian experimental design", volume = "4", year = "2014", doi = "10.1615/Int.J.UncertaintyQuantification.2014006730", abstract = "Optimal experimental design (OED) seeks experiments expected to yield the most useful data for some purpose. In practical circumstances where experiments are time-consuming or resource-intensive, OED can yield enormous savings. We pursue OED for nonlinear systems from a Bayesian perspective, with the goal of choosing experiments that are optimal for parameter inference. Our objective in this context is the expected information gain in model parameters, which in general can only be estimated using Monte Carlo methods. Maximizing this objective thus becomes a stochastic optimization problem. This paper develops gradient-based stochastic optimization methods for the design of experiments on a continuous parameter space. Given a Monte Carlo estimator of expected information gain, we use infinitesimal perturbation analysis to derive gradients of this estimator. We are then able to formulate two gradient-based stochastic optimization approaches: (i) Robbins-Monro stochastic approximation, and (ii) sample average approximation combined with a deterministic quasi-Newton method. A polynomial chaos approximation of the forward model accelerates objective and gradient evaluations in both cases. We discuss the implementation of these optimization methods, then conduct an empirical comparison of their performance. To demonstrate design in a nonlinear setting with partial differential equation forward models, we use the problem of sensor placement for source inversion. Numerical results yield useful guidelines on the choice of algorithm and sample sizes, assess the impact of estimator bias, and quantify tradeoffs of computational cost versus solution quality and robustness." }

*A priori testing of sparse adaptive polynomial chaos expansions using an ocean general circulation model database,*Computational Geosciences, 17 (2013), pp. 899-911.

##### A priori testing of sparse adaptive polynomial chaos expansions using an ocean general circulation model database

This work explores the implementation of an adaptive strategy to design sparse ensembles of oceanic simulations suitable for constructing polynomial chaos surrogates. We use a recently developed pseudo-spectral algorithm that is based on a direct application of the Smolyak sparse grid formula and that allows the use of arbitrary admissible sparse grids. The adaptive algorithm is tested using an existing simulation database of the oceanic response to Hurricane Ivan in the Gulf of Mex- ico. The a priori tests demonstrate that sparse and adaptive pseudo-spectral constructions lead to substantial savings over isotropic sparse sampling in the present setting.

##### BibTeX entry

@article{art_24, author = "J. Winokur and P. Conrad and I. Sraj and O. Knio and A. Srinivasan and W. C. Thacker and Y. M. Marzouk and M. Iskandarani,", journal = "Computational Geosciences", number = "6", pages = "899-911", title = "A priori testing of sparse adaptive polynomial chaos expansions using an ocean general circulation model database", volume = "17", year = "2013", doi = "10.1007/s10596-013-9361-3", abstract = "This work explores the implementation of an adaptive strategy to design sparse ensembles of oceanic simulations suitable for constructing polynomial chaos surrogates. We use a recently developed pseudo-spectral algorithm that is based on a direct application of the Smolyak sparse grid formula and that allows the use of arbitrary admissible sparse grids. The adaptive algorithm is tested using an existing simulation database of the oceanic response to Hurricane Ivan in the Gulf of Mex- ico. The a priori tests demonstrate that sparse and adaptive pseudo-spectral constructions lead to substantial savings over isotropic sparse sampling in the present setting." }

*Adaptive Smolyak pseudospectral approximations,*SIAM Journal on Scientific Computing, 35 (2013), pp. A2643–A2670.

##### Adaptive Smolyak pseudospectral approximations

Polynomial approximations of computationally intensive models are central to uncertainty quantification. This paper describes an adaptive method for non-intrusive pseudospectral approximation, based on Smolyak’s algorithm with generalized sparse grids. We rigorously analyze and extend the non-adaptive method proposed in [Constantine et al. 2012], and compare it to a common alternative approach for using sparse grids to construct polynomial approximations, direct quadrature. Analysis of direct quadrature shows that O(1) errors are an intrinsic property of some configurations of the method, as a consequence of internal aliasing. We provide precise conditions, based on the chosen polynomial basis and quadrature rules, under which this aliasing error occurs. We then establish theoretical results on the accuracy of Smolyak pseudospectral approximation, and show that the Smolyak approximation avoids internal aliasing and makes far more effective use of sparse function evaluations. These results are applicable to broad choices of quadrature rule and generalized sparse grids. Exploiting this flexibility, we introduce a greedy heuristic for adaptive refinement of the pseudospectral approximation. We numerically demonstrate convergence of the algorithm on the Genz test functions, and illustrate the accuracy and efficiency of the adaptive approach on a realistic chemical kinetics problem.

##### BibTeX entry

@article{art_23, author = "P. Conrad and Y. M. Marzouk", journal = "SIAM Journal on Scientific Computing", number = "6", pages = "A2643--A2670", title = "Adaptive Smolyak pseudospectral approximations", volume = "35", year = "2013", doi = "10.1137/120890715", abstract = "Polynomial approximations of computationally intensive models are central to uncertainty quantification. This paper describes an adaptive method for non-intrusive pseudospectral approximation, based on Smolyak's algorithm with generalized sparse grids. We rigorously analyze and extend the non-adaptive method proposed in [Constantine et al. 2012], and compare it to a common alternative approach for using sparse grids to construct polynomial approximations, direct quadrature. Analysis of direct quadrature shows that O(1) errors are an intrinsic property of some configurations of the method, as a consequence of internal aliasing. We provide precise conditions, based on the chosen polynomial basis and quadrature rules, under which this aliasing error occurs. We then establish theoretical results on the accuracy of Smolyak pseudospectral approximation, and show that the Smolyak approximation avoids internal aliasing and makes far more effective use of sparse function evaluations. These results are applicable to broad choices of quadrature rule and generalized sparse grids. Exploiting this flexibility, we introduce a greedy heuristic for adaptive refinement of the pseudospectral approximation. We numerically demonstrate convergence of the algorithm on the Genz test functions, and illustrate the accuracy and efficiency of the adaptive approach on a realistic chemical kinetics problem.", keywords = "Smolyak algorithms, sparse grids, orthogonal polynomials, pseudospectral approximation, approximation theory, uncertainty quantification" }

*Bayesian inverse problems with Monte Carlo forward models,*Inverse Problems and Imaging, 7 (2013), pp. 81–105.

##### Bayesian inverse problems with Monte Carlo forward models

The full application of Bayesian inference to inverse problems requires exploration of a posterior distribution that typically does not possess a standard form. In this context, Markov chain Monte Carlo (MCMC) methods are often used. These methods require many evaluations of a computationally intensive forward model to produce the equivalent of one independent sample from the posterior. We consider applications in which approximate forward models at multiple resolution levels are available, each endowed with a probabilistic error estimate. These situations occur, for example, when the forward model involves Monte Carlo integration. We present a novel MCMC method called $MC^3$ that uses low-resolution forward models to approximate draws from a posterior distribution built with the high-resolution forward model. The acceptance ratio is estimated with some statistical error; then a confidence interval for the true acceptance ratio is found, and acceptance is performed correctly with some confidence. The high-resolution models are rarely run and a significant speed up is achieved. Our multiple-resolution forward models themselves are built around a new importance sampling scheme that allows Monte Carlo forward models to be used efficiently in inverse problems. The method is used to solve an inverse transport problem that finds applications in atmospheric remote sensing. We present a path-recycling methodology to efficiently vary parameters in the transport equation. The forward transport equation is solved by a Monte Carlo method that is amenable to the use of $MC^3$ to solve the inverse transport problem using a Bayesian formalism.

##### BibTeX entry

@article{art_22, author = "G. Bal and I. Langmore and Y. M. Marzouk", journal = "Inverse Problems and Imaging", number = "1", pages = "81--105", title = "Bayesian inverse problems with Monte Carlo forward models", volume = "7", year = "2013", doi = "10.3934/ipi.2013.7.81", abstract = "The full application of Bayesian inference to inverse problems requires exploration of a posterior distribution that typically does not possess a standard form. In this context, Markov chain Monte Carlo (MCMC) methods are often used. These methods require many evaluations of a computationally intensive forward model to produce the equivalent of one independent sample from the posterior. We consider applications in which approximate forward models at multiple resolution levels are available, each endowed with a probabilistic error estimate. These situations occur, for example, when the forward model involves Monte Carlo integration. We present a novel MCMC method called $MC^3$ that uses low-resolution forward models to approximate draws from a posterior distribution built with the high-resolution forward model. The acceptance ratio is estimated with some statistical error; then a confidence interval for the true acceptance ratio is found, and acceptance is performed correctly with some confidence. The high-resolution models are rarely run and a significant speed up is achieved. Our multiple-resolution forward models themselves are built around a new importance sampling scheme that allows Monte Carlo forward models to be used efficiently in inverse problems. The method is used to solve an inverse transport problem that finds applications in atmospheric remote sensing. We present a path-recycling methodology to efficiently vary parameters in the transport equation. The forward transport equation is solved by a Monte Carlo method that is amenable to the use of $MC^3$ to solve the inverse transport problem using a Bayesian formalism.", keywords = "linear transport; perturbation Monte Carlo; Bayesian; importance sampling; inverse problems; Markov chain Monte Carlo" }

*Simulation-based optimal Bayesian experimental design for nonlinear systems,*Journal of Computational Physics, 232 (2013), pp. 288–317.

##### Simulation-based optimal Bayesian experimental design for nonlinear systems

The optimal selection of experimental conditions is essential to maximizing the value of data for inference and prediction, particularly in situations where experiments are time-consuming and expensive to conduct. We propose a general mathematical framework and an algorithmic approach for optimal experimental design with nonlinear simulation-based models; in particular, we focus on finding sets of experiments that provide the most information about targeted sets of parameters. Our framework employs a Bayesian statistical setting, which provides a foundation for inference from noisy, indirect, and incomplete data, and a natural mechanism for incorporating heterogeneous sources of information. An objective function is constructed from information theoretic measures, reflecting expected information gain from proposed combinations of experiments. Polynomial chaos approximations and a two-stage Monte Carlo sampling method are used to evaluate the expected information gain. Stochastic approximation algorithms are then used to make optimization feasible in computationally intensive and high-dimensional settings. These algorithms are demonstrated on model problems and on nonlinear parameter inference problems arising in detailed combustion kinetics.

##### BibTeX entry

@article{art_21, author = "X. Huan and Y. M. Marzouk", journal = "Journal of Computational Physics", month = "1", number = "1", pages = "288--317", title = "Simulation-based optimal Bayesian experimental design for nonlinear systems", volume = "232", year = "2013", doi = "10.1016/j.jcp.2012.08.013", abstract = "The optimal selection of experimental conditions is essential to maximizing the value of data for inference and prediction, particularly in situations where experiments are time-consuming and expensive to conduct. We propose a general mathematical framework and an algorithmic approach for optimal experimental design with nonlinear simulation-based models; in particular, we focus on finding sets of experiments that provide the most information about targeted sets of parameters. Our framework employs a Bayesian statistical setting, which provides a foundation for inference from noisy, indirect, and incomplete data, and a natural mechanism for incorporating heterogeneous sources of information. An objective function is constructed from information theoretic measures, reflecting expected information gain from proposed combinations of experiments. Polynomial chaos approximations and a two-stage Monte Carlo sampling method are used to evaluate the expected information gain. Stochastic approximation algorithms are then used to make optimization feasible in computationally intensive and high-dimensional settings. These algorithms are demonstrated on model problems and on nonlinear parameter inference problems arising in detailed combustion kinetics.", keywords = "Uncertainty quantification; Bayesian inference; Optimal experimental design; Nonlinear experimental design; Stochastic approximation; Shannon information; Chemical kinetics" }

*Bayesian inference with optimal maps,*Journal of Computational Physics, 231 (2012), pp. 7815–7850.

##### Bayesian inference with optimal maps

We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of an optimization problem, exploiting gradient information from the forward model when possible. The resulting algorithm overcomes many of the computational bottlenecks associated with Markov chain Monte Carlo. Advantages of a map-based representation of the posterior include analytical expressions for posterior moments and the ability to generate arbitrary numbers of independent posterior samples without additional likelihood evaluations or forward solves. The optimization approach also provides clear convergence criteria for posterior approximation and facilitates model selection through automatic evaluation of the marginal likelihood. We demonstrate the accuracy and efficiency of the approach on nonlinear inverse problems of varying dimension, involving the inference of parameters appearing in ordinary and partial differential equations.

##### BibTeX entry

@article{art_20, author = "T. A. Moselhy and Y. M. Marzouk", journal = "Journal of Computational Physics", number = "23", pages = "7815--7850", title = "Bayesian inference with optimal maps", volume = "231", year = "2012", doi = "10.1016/j.jcp.2012.07.022", abstract = "We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of an optimization problem, exploiting gradient information from the forward model when possible. The resulting algorithm overcomes many of the computational bottlenecks associated with Markov chain Monte Carlo. Advantages of a map-based representation of the posterior include analytical expressions for posterior moments and the ability to generate arbitrary numbers of independent posterior samples without additional likelihood evaluations or forward solves. The optimization approach also provides clear convergence criteria for posterior approximation and facilitates model selection through automatic evaluation of the marginal likelihood. We demonstrate the accuracy and efficiency of the approach on nonlinear inverse problems of varying dimension, involving the inference of parameters appearing in ordinary and partial differential equations.", keywords = "Bayesian inference; Optimal transport; Measure-preserving maps; Inverse problems; Polynomial chaos; Numerical optimization" }

*Bayesian reconstruction of binary media with unresolved fine-scale spatial structures,*Advances in Water Resources, 44 (2012), pp. 1–19.

##### Bayesian reconstruction of binary media with unresolved fine-scale spatial structures

##### BibTeX entry

@article{art_19, author = "J. Ray and S. McKenna and B. van Bloemen Waanders and Y. M. Marzouk", journal = "Advances in Water Resources", month = "8", pages = "1--19", title = "Bayesian reconstruction of binary media with unresolved fine-scale spatial structures", volume = "44", year = "2012", doi = "10.1016/j.advwatres.2012.04.009", keywords = "Upscaling; Binary media; Bayesian technique; Multiscale inference" }

*Sequential data assimilation with multiple models,*Journal of Computational Physics, 231 (2012), pp. 6401–6418.

##### Sequential data assimilation with multiple models

Data assimilation is an essential tool for predicting the behavior of real physical systems given approximate simulation models and limited observations. For many complex systems, there may exist several models, each with different properties and predictive capabilities. It is desirable to incorporate multiple models into the assimilation procedure in order to obtain a more accurate prediction of the physics than any model alone can provide. In this paper, we propose a framework for conducting sequential data assimilation with multiple models and sources of data. The assimilated solution is a linear combination of all model predictions and data. One notable feature is that the combination takes the most general form with matrix weights. By doing so the method can readily utilize different weights in different sections of the solution state vectors, allow the models and data to have different dimensions, and deal with the case of a singular state covariance. We prove that the proposed assimilation method, termed direct assimilation, minimizes a variational functional, a generalized version of the one used in the classical Kalman filter. We also propose an efficient iterative assimilation method that assimilates two models at a time until all models and data are assimilated. The mathematical equivalence of the iterative method and the direct method is established. Numerical examples are presented to demonstrate the effectiveness of the new method.

##### BibTeX entry

@article{art_18, author = "A. Narayan and Y. M. Marzouk and D. Xiu", journal = "Journal of Computational Physics", number = "19", pages = "6401--6418", title = "Sequential data assimilation with multiple models", volume = "231", year = "2012", doi = "10.1016/j.jcp.2012.06.002", abstract = "Data assimilation is an essential tool for predicting the behavior of real physical systems given approximate simulation models and limited observations. For many complex systems, there may exist several models, each with different properties and predictive capabilities. It is desirable to incorporate multiple models into the assimilation procedure in order to obtain a more accurate prediction of the physics than any model alone can provide. In this paper, we propose a framework for conducting sequential data assimilation with multiple models and sources of data. The assimilated solution is a linear combination of all model predictions and data. One notable feature is that the combination takes the most general form with matrix weights. By doing so the method can readily utilize different weights in different sections of the solution state vectors, allow the models and data to have different dimensions, and deal with the case of a singular state covariance. We prove that the proposed assimilation method, termed direct assimilation, minimizes a variational functional, a generalized version of the one used in the classical Kalman filter. We also propose an efficient iterative assimilation method that assimilates two models at a time until all models and data are assimilated. The mathematical equivalence of the iterative method and the direct method is established. Numerical examples are presented to demonstrate the effectiveness of the new method.", keywords = "Uncertainty quantification; Data assimilation; Kalman filter; Model averaging" }

*Data-free inference of the joint distribution of uncertain model parameters,*Journal of Computational Physics, 231 (2012), pp. 2180-2198.

##### Data-free inference of the joint distribution of uncertain model parameters

A critical problem in accurately estimating uncertainty in model predictions is the lack of details in the literature on the correlation (or full joint distribution) of uncertain model parameters. In this paper we describe a framework and a class of algorithms for analyzing such “missing data” problems in the setting of Bayesian statistics. The analysis focuses on the family of posterior distributions consistent with given statistics (e.g. nominal values, confidence intervals). The combining of consistent distributions is addressed via techniques from the opinion pooling literature. The developed approach allows subsequent propagation of uncertainty in model inputs consistent with reported statistics, in the absence of data.

##### BibTeX entry

@article{art_17, author = "R. D. Berry and H. N. Najm and B. J. Debusschere and Y. M. Marzouk and H. Adalsteinsson", journal = "Journal of Computational Physics", number = "5", pages = "2180-2198", title = "Data-free inference of the joint distribution of uncertain model parameters", volume = "231", year = "2012", doi = "10.1016/j.jcp.2011.10.031", abstract = "A critical problem in accurately estimating uncertainty in model predictions is the lack of details in the literature on the correlation (or full joint distribution) of uncertain model parameters. In this paper we describe a framework and a class of algorithms for analyzing such “missing data” problems in the setting of Bayesian statistics. The analysis focuses on the family of posterior distributions consistent with given statistics (e.g. nominal values, confidence intervals). The combining of consistent distributions is addressed via techniques from the opinion pooling literature. The developed approach allows subsequent propagation of uncertainty in model inputs consistent with reported statistics, in the absence of data.", keywords = "Uncertainty quantification; Bayesian statistics; Missing information" }

*Computational singular perturbation with non-parametric tabulation of slow manifolds for time integration of stiff chemical kinetics,*Combustion Theory and Modeling, 16 (2011), pp. 173-198.

##### Computational singular perturbation with non-parametric tabulation of slow manifolds for time integration of stiff chemical kinetics

This paper presents a novel tabulation strategy for the adaptive numerical integration of chemical kinetics using the computational singular perturbation (CSP) method. The strategy stores and reuses CSP quantities required to filter out fast dissipative processes, resulting in a non-stiff chemical source term. In particular, non-parametric regression on low-dimensional slow invariant manifolds (SIMs) in the chemical state space is used to approximate the CSP vectors spanning the fast chemical subspace and the associated fast chemical time-scales. The relevant manifold and its dimension varies depending on the local number of exhausted modes at every location in the chemical state space. Multiple manifolds are therefore tabulated, corresponding to different numbers of exhausted modes (dimensions) and associated radical species. Non-parametric representations are inherently adaptive, and rely on efficient approximate-nearest-neighbor queries. As the CSP information is only a function of the non-radical species in the system and has relatively small gradients in the chemical state space, tabulation occurs in a lower-dimensional state space and at a relatively coarse level, thereby improving scalability to larger chemical mechanisms. The approach is demonstrated on the simulation of homogeneous constant pressure H2–air and CH4–air ignition, over a range of initial conditions. For CH4–air, results are shown that outperform direct implicit integration of the stiff chemical kinetics while maintaining good accuracy.

##### BibTeX entry

@article{art_16, author = "B. J. Debusschere and Y. M. Marzouk and H. N. Najm and B. Rhoads and D. A. Goussis and M. Valorani", journal = "Combustion Theory and Modeling", number = "1", pages = "173-198", title = "Computational singular perturbation with non-parametric tabulation of slow manifolds for time integration of stiff chemical kinetics", volume = "16", year = "2011", doi = "10.1080/13647830.2011.596575", abstract = "This paper presents a novel tabulation strategy for the adaptive numerical integration of chemical kinetics using the computational singular perturbation (CSP) method. The strategy stores and reuses CSP quantities required to filter out fast dissipative processes, resulting in a non-stiff chemical source term. In particular, non-parametric regression on low-dimensional slow invariant manifolds (SIMs) in the chemical state space is used to approximate the CSP vectors spanning the fast chemical subspace and the associated fast chemical time-scales. The relevant manifold and its dimension varies depending on the local number of exhausted modes at every location in the chemical state space. Multiple manifolds are therefore tabulated, corresponding to different numbers of exhausted modes (dimensions) and associated radical species. Non-parametric representations are inherently adaptive, and rely on efficient approximate-nearest-neighbor queries. As the CSP information is only a function of the non-radical species in the system and has relatively small gradients in the chemical state space, tabulation occurs in a lower-dimensional state space and at a relatively coarse level, thereby improving scalability to larger chemical mechanisms. The approach is demonstrated on the simulation of homogeneous constant pressure H2--air and CH4--air ignition, over a range of initial conditions. For CH4--air, results are shown that outperform direct implicit integration of the stiff chemical kinetics while maintaining good accuracy.", keywords = "chemical kinetics, computational singular perturbation, slow manifold, non-parametric regression, nearest neighbors, kd-trees" }

*Truncated multi-Gaussian fields and effective conductance of binary media,*Advances in Water Resources, 34 (2011), pp. 617-626.

##### Truncated multi-Gaussian fields and effective conductance of binary media

Truncated Gaussian fields provide a flexible model for defining binary media with dispersed (as opposed to layered) inclusions. General properties of excursion sets on these truncated fields are coupled with a distance-based upscaling algorithm and approximations of point process theory to develop an estimation approach for effective conductivity in two-dimensions. Estimation of effective conductivity is derived directly from knowledge of the kernel size used to create the multiGaussian field, defined as the full-width at half maximum (FWHM), the truncation threshold and conductance values of the two modes. Therefore, instantiation of the multiGaussian field is not necessary for estimation of the effective conductance. The critical component of the effective medium approximation developed here is the mean distance between high conductivity inclusions. This mean distance is characterized as a function of the FWHM, the truncation threshold and the ratio of the two modal conductivities. Sensitivity of the resulting effective conductivity to this mean distance is examined for two levels of contrast in the modal conductances and different FWHM sizes. Results demonstrate that the FWHM is a robust measure of mean travel distance in the background medium. The resulting effective conductivities are accurate when compared to numerical results and results obtained from effective media theory, distance-based upscaling and numerical simulation.

##### BibTeX entry

@article{art_15, author = "S. McKenna and J. Ray and Y. M. Marzouk and B. van Bloemen Waanders", journal = "Advances in Water Resources", month = "5", number = "5", pages = "617-626", title = "Truncated multi-Gaussian fields and effective conductance of binary media", volume = "34", year = "2011", doi = "10.1016/j.advwatres.2011.02.011", abstract = "Truncated Gaussian fields provide a flexible model for defining binary media with dispersed (as opposed to layered) inclusions. General properties of excursion sets on these truncated fields are coupled with a distance-based upscaling algorithm and approximations of point process theory to develop an estimation approach for effective conductivity in two-dimensions. Estimation of effective conductivity is derived directly from knowledge of the kernel size used to create the multiGaussian field, defined as the full-width at half maximum (FWHM), the truncation threshold and conductance values of the two modes. Therefore, instantiation of the multiGaussian field is not necessary for estimation of the effective conductance. The critical component of the effective medium approximation developed here is the mean distance between high conductivity inclusions. This mean distance is characterized as a function of the FWHM, the truncation threshold and the ratio of the two modal conductivities. Sensitivity of the resulting effective conductivity to this mean distance is examined for two levels of contrast in the modal conductances and different FWHM sizes. Results demonstrate that the FWHM is a robust measure of mean travel distance in the background medium. The resulting effective conductivities are accurate when compared to numerical results and results obtained from effective media theory, distance-based upscaling and numerical simulation.", keywords = "Upscaling; Binary media; Effective conductivity" }

*Contributions of the wall boundary layer to the formation of the counter-rotating vortex pair in transverse jets.,*Journal of Fluid Mechanics, 676 (2011), pp. 461-490.

##### Contributions of the wall boundary layer to the formation of the counter-rotating vortex pair in transverse jets.

Using high-resolution 3-D vortex simulations, this study seeks a mechanistic understanding of vorticity dynamics in transverse jets at a finite Reynolds number. A full no-slip boundary condition, rigorously formulated in terms of vorticity generation along the channel wall, captures unsteady interactions between the wall boundary layer and the jet – in particular, the separation of the wall boundary layer and its transport into the interior. For comparison, we also implement a reduced boundary condition that suppresses the separation of the wall boundary layer away from the jet nozzle. By contrasting results obtained with these two boundary conditions, we characterize near-field vortical structures formed as the wall boundary layer separates on the backside of the jet. Using various Eulerian and Lagrangian diagnostics, it is demonstrated that several near-wall vortical structures are formed as the wall boundary layer separates. The counter-rotating vortex pair, manifested by the presence of vortices aligned with the jet trajectory, is initiated closer to the jet exit. Moreover tornado-like wall-normal vortices originate from the separation of spanwise vorticity in the wall boundary layer at the side of the jet and from the entrainment of streamwise wall vortices in the recirculation zone on the lee side. These tornado-like vortices are absent in the case where separation is suppressed. Tornado-like vortices merge with counter-rotating vorticity originating in the jet shear layer, significantly increasing wall-normal circulation and causing deeper jet penetration into the crossflow stream.

##### BibTeX entry

@article{art_14, author = "F. Schlegel and D. Wee and Y. M. Marzouk and A. F. Ghoniem", journal = "Journal of Fluid Mechanics", number = "1", pages = "461-490", title = "Contributions of the wall boundary layer to the formation of the counter-rotating vortex pair in transverse jets.", volume = "676", year = "2011", doi = "10.1017/jfm.2011.59", abstract = "Using high-resolution 3-D vortex simulations, this study seeks a mechanistic understanding of vorticity dynamics in transverse jets at a finite Reynolds number. A full no-slip boundary condition, rigorously formulated in terms of vorticity generation along the channel wall, captures unsteady interactions between the wall boundary layer and the jet -- in particular, the separation of the wall boundary layer and its transport into the interior. For comparison, we also implement a reduced boundary condition that suppresses the separation of the wall boundary layer away from the jet nozzle. By contrasting results obtained with these two boundary conditions, we characterize near-field vortical structures formed as the wall boundary layer separates on the backside of the jet. Using various Eulerian and Lagrangian diagnostics, it is demonstrated that several near-wall vortical structures are formed as the wall boundary layer separates. The counter-rotating vortex pair, manifested by the presence of vortices aligned with the jet trajectory, is initiated closer to the jet exit. Moreover tornado-like wall-normal vortices originate from the separation of spanwise vorticity in the wall boundary layer at the side of the jet and from the entrainment of streamwise wall vortices in the recirculation zone on the lee side. These tornado-like vortices are absent in the case where separation is suppressed. Tornado-like vortices merge with counter-rotating vorticity originating in the jet shear layer, significantly increasing wall-normal circulation and causing deeper jet penetration into the crossflow stream.", keywords = "jets; vortex flows" }

*Bayesian inference of atomic diffusivity in a binary Ni/Al system based on molecular dynamics,*SIAM Multiscale Modeling and Simulation, 9 (2011), pp. 486-512.

##### Bayesian inference of atomic diffusivity in a binary Ni/Al system based on molecular dynamics

This work focuses on characterizing the integral features of atomic diffusion in Ni/Al nanolaminates based on molecular dynamics (MD) computations. Attention is focused on the simplified problem of extracting the diffusivity, D, in an isothermal system at high temperature. To this end, a mixing measure theory is developed that relies on analyzing the moments of the cumulative distribution functions (CDFs) of the constituents. The mixing measures obtained from replica simulations are exploited in a Bayesian inference framework, based on contrasting these measures with corresponding moments of a dimensionless concentration evolving according to a Fickian process. The noise inherent in the MD simulations is described as a Gaussian process, and this hypothesis is verified both a priori and using a posterior predictive check. Computed values of D for an initially unmixed system rapidly heated to 1500 K are found to be consistent with experimental correlation for diffusion of Ni into molten Al. On the contrary, large discrepancies with experimental predictions are observed when D is estimated based on large-time mean-square displacement (MSD) analysis, and when it is evaluated using the Arrhenius correlation calibrated against experimental measurements of self-propagating front velocities. Implications are finally drawn regarding extension of the present work and potential refinement of continuum modeling approaches.

##### BibTeX entry

@article{art_13, author = "F. Rizzi and M. Salloum and Y. M. Marzouk and R. Xu and M. L. Falk and T. P. Weihs and G. Fritz and O. M. Knio", journal = "SIAM Multiscale Modeling and Simulation", number = "1", pages = "486-512", title = "Bayesian inference of atomic diffusivity in a binary Ni/Al system based on molecular dynamics", volume = "9", year = "2011", doi = "10.1137/10080590X", abstract = "This work focuses on characterizing the integral features of atomic diffusion in Ni/Al nanolaminates based on molecular dynamics (MD) computations. Attention is focused on the simplified problem of extracting the diffusivity, D, in an isothermal system at high temperature. To this end, a mixing measure theory is developed that relies on analyzing the moments of the cumulative distribution functions (CDFs) of the constituents. The mixing measures obtained from replica simulations are exploited in a Bayesian inference framework, based on contrasting these measures with corresponding moments of a dimensionless concentration evolving according to a Fickian process. The noise inherent in the MD simulations is described as a Gaussian process, and this hypothesis is verified both a priori and using a posterior predictive check. Computed values of D for an initially unmixed system rapidly heated to 1500 K are found to be consistent with experimental correlation for diffusion of Ni into molten Al. On the contrary, large discrepancies with experimental predictions are observed when D is estimated based on large-time mean-square displacement (MSD) analysis, and when it is evaluated using the Arrhenius correlation calibrated against experimental measurements of self-propagating front velocities. Implications are finally drawn regarding extension of the present work and potential refinement of continuum modeling approaches." }

*A Bayesian approach for estimating bioterror attacks from patient data,*Statistics in Medicine, 30 (2010), pp. 101-126.

##### A Bayesian approach for estimating bioterror attacks from patient data

Terrorist attacks using an aerosolized pathogen have gained credibility as a national security concern after the anthrax attacks of 2001. Inferring some important details of the attack quickly, for example, the number of people infected, the time of infection, and a representative dose received can be crucial to planning a medical response. We use a Bayesian approach, based on a short time series of diagnosed patients, to estimate a joint probability density for these parameters. We first test the formulation with idealized cases and then apply it to realistic scenarios, including the Sverdlovsk anthrax outbreak of 1979. We also use simulated outbreaks to explore the impact of model error, as when the model used for generating simulated epidemic curves does not match the model subsequently used to characterize the attack. We find that in all cases except for the smallest attacks (fewer than 100 infected people), 3–5 days of data are sufficient to characterize the outbreak to a specificity that is useful for directing an emergency response.

##### BibTeX entry

@article{art_12, author = "J. Ray and Y. M. Marzouk and H. N. Najm", journal = "Statistics in Medicine", number = "2", pages = "101-126", title = "A Bayesian approach for estimating bioterror attacks from patient data", volume = "30", year = "2010", doi = "10.1002/sim.4090", abstract = "Terrorist attacks using an aerosolized pathogen have gained credibility as a national security concern after the anthrax attacks of 2001. Inferring some important details of the attack quickly, for example, the number of people infected, the time of infection, and a representative dose received can be crucial to planning a medical response. We use a Bayesian approach, based on a short time series of diagnosed patients, to estimate a joint probability density for these parameters. We first test the formulation with idealized cases and then apply it to realistic scenarios, including the Sverdlovsk anthrax outbreak of 1979. We also use simulated outbreaks to explore the impact of model error, as when the model used for generating simulated epidemic curves does not match the model subsequently used to characterize the attack. We find that in all cases except for the smallest attacks (fewer than 100 infected people), 3--5 days of data are sufficient to characterize the outbreak to a specificity that is useful for directing an emergency response.", keywords = "Bayesian inference;anthrax;Sverdlovsk outbreak;bioterrorism" }

*Convergence characteristics and computational cost of two algebraic kernels in vortex methods with a tree-code algorithm,*SIAM Journal on Scientific Computing, 31 (2009), pp. 2510-2527.

##### Convergence characteristics and computational cost of two algebraic kernels in vortex methods with a tree-code algorithm

We study the convergence characteristics of two algebraic kernels used in vortex calculations: the Rosenhead–Moore kernel, which is a low-order kernel, and the Winckelmans–Leonard kernel, which is a high-order kernel. To facilitate the study, a method of evaluating particle-cluster interactions is introduced for the Winckelmans–Leonard kernel. The method is based on Taylor series expansion in Cartesian coordinates, as initially proposed by Lindsay and Krasny [J. Comput. Phys., 172 (2001), pp. 879–907] for the Rosenhead–Moore kernel. A recurrence relation for the Taylor coefficients of the Winckelmans–Leonard kernel is derived by separating the kernel into two parts, and an error estimate is obtained to ensure adaptive error control. The recurrence relation is incorporated into a tree-code to evaluate vorticity-induced velocity. Next, comparison of convergence is made while utilizing the tree-code. Both algebraic kernels lead to convergence, but the Winckelmans–Leonard kernel exhibits a superior convergence rate. The combined desingularization and discretization error from the Winckelmans–Leonard kernel is an order of magnitude smaller than that from the Rosenhead–Moore kernel at a typical resolution. Simulations of vortex rings are performed using the two algebraic kernels in order to compare their performance in a practical setting. In particular, numerical simulations of the side-by-side collision of two identical vortex rings suggest that the three-dimensional evolution of vorticity at finite resolution can be greatly affected by the choice of the kernel. We find that the Winckelmans–Leonard kernel is able to perform the same task with a much smaller number of vortex elements than the Rosenhead–Moore kernel, greatly reducing the overall computational cost.

##### BibTeX entry

@article{art_11, author = "D. Wee and Y. M. Marzouk and F. Schlegel and A. F. Ghoniem", journal = "SIAM Journal on Scientific Computing", number = "4", pages = "2510-2527", title = "Convergence characteristics and computational cost of two algebraic kernels in vortex methods with a tree-code algorithm", volume = "31", year = "2009", doi = "10.1137/080726872", abstract = "We study the convergence characteristics of two algebraic kernels used in vortex calculations: the Rosenhead--Moore kernel, which is a low-order kernel, and the Winckelmans--Leonard kernel, which is a high-order kernel. To facilitate the study, a method of evaluating particle-cluster interactions is introduced for the Winckelmans--Leonard kernel. The method is based on Taylor series expansion in Cartesian coordinates, as initially proposed by Lindsay and Krasny [J. Comput. Phys., 172 (2001), pp. 879--907] for the Rosenhead--Moore kernel. A recurrence relation for the Taylor coefficients of the Winckelmans--Leonard kernel is derived by separating the kernel into two parts, and an error estimate is obtained to ensure adaptive error control. The recurrence relation is incorporated into a tree-code to evaluate vorticity-induced velocity. Next, comparison of convergence is made while utilizing the tree-code. Both algebraic kernels lead to convergence, but the Winckelmans--Leonard kernel exhibits a superior convergence rate. The combined desingularization and discretization error from the Winckelmans--Leonard kernel is an order of magnitude smaller than that from the Rosenhead--Moore kernel at a typical resolution. Simulations of vortex rings are performed using the two algebraic kernels in order to compare their performance in a practical setting. In particular, numerical simulations of the side-by-side collision of two identical vortex rings suggest that the three-dimensional evolution of vorticity at finite resolution can be greatly affected by the choice of the kernel. We find that the Winckelmans--Leonard kernel is able to perform the same task with a much smaller number of vortex elements than the Rosenhead--Moore kernel, greatly reducing the overall computational cost." }

*Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems,*Journal of Computational Physics, 228 (2009), pp. 1862-1902.

##### Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems

We consider a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior. Computational challenges in this construction arise from the need for repeated evaluations of the forward model (e.g., in the context of Markov chain Monte Carlo) and are compounded by high dimensionality of the posterior. We address these challenges by introducing truncated Karhunen–Loève expansions, based on the prior distribution, to efficiently parameterize the unknown field and to specify a stochastic forward problem whose solution captures that of the deterministic forward model over the support of the prior. We seek a solution of this problem using Galerkin projection on a polynomial chaos basis, and use the solution to construct a reduced-dimensionality surrogate posterior density that is inexpensive to evaluate. We demonstrate the formulation on a transient diffusion equation with prescribed source terms, inferring the spatially-varying diffusivity of the medium from limited and noisy data.

##### BibTeX entry

@article{art_10, author = "Y. M. Marzouk and H. N. Najm", journal = "Journal of Computational Physics", number = "6", pages = "1862-1902", title = "Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems", volume = "228", year = "2009", doi = "10.1016/j.jcp.2008.11.024", abstract = "We consider a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior. Computational challenges in this construction arise from the need for repeated evaluations of the forward model (e.g., in the context of Markov chain Monte Carlo) and are compounded by high dimensionality of the posterior. We address these challenges by introducing truncated Karhunen--Loève expansions, based on the prior distribution, to efficiently parameterize the unknown field and to specify a stochastic forward problem whose solution captures that of the deterministic forward model over the support of the prior. We seek a solution of this problem using Galerkin projection on a polynomial chaos basis, and use the solution to construct a reduced-dimensionality surrogate posterior density that is inexpensive to evaluate. We demonstrate the formulation on a transient diffusion equation with prescribed source terms, inferring the spatially-varying diffusivity of the medium from limited and noisy data.", keywords = "Inverse problems; Bayesian inference; Dimensionality reduction; Polynomial chaos; Markov chain Monte Carlo; Galerkin projection; Gaussian processes; Karhunen--Loève expansion; RKHS" }

*A stochastic collocation approach to Bayesian inference in inverse problems,*Communications in Computational Physics, 6 (2009), pp. 826-847.

##### A stochastic collocation approach to Bayesian inference in inverse problems

We present an efficient numerical strategy for the Bayesian solution of inverse problems. Stochastic collocation methods, based on generalized polynomial chaos (gPC), are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. This approximation then defines a surrogate posterior probability density that can be evaluated repeatedly at minimal computational cost. The ability to simulate a large number of samples from the posterior distribution results in very accurate estimates of the inverse solution and its associated uncertainty. Combined with high accuracy of the gPC-based forward solver, the new algorithm can provide great efficiency in practical applications. A rigorous error analysis of the algorithm is conducted, where we establish convergence of the approximate posterior to the true posterior and obtain an estimate of the convergence rate. It is proved that fast (exponential) convergence of the gPC forward solution yields similarly fast (exponential) convergence of the posterior. The numerical strategy and the predicted convergence rates are then demonstrated on nonlinear inverse problems ofvarying smoothness and dimension.

##### BibTeX entry

@article{art_9, author = "Y. M. Marzouk and D. Xiu", journal = "Communications in Computational Physics", number = "1", pages = "826-847", title = "A stochastic collocation approach to Bayesian inference in inverse problems", volume = "6", year = "2009", doi = "DOI:prism/16", abstract = "We present an efficient numerical strategy for the Bayesian solution of inverse problems. Stochastic collocation methods, based on generalized polynomial chaos (gPC), are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. This approximation then defines a surrogate posterior probability density that can be evaluated repeatedly at minimal computational cost. The ability to simulate a large number of samples from the posterior distribution results in very accurate estimates of the inverse solution and its associated uncertainty. Combined with high accuracy of the gPC-based forward solver, the new algorithm can provide great efficiency in practical applications. A rigorous error analysis of the algorithm is conducted, where we establish convergence of the approximate posterior to the true posterior and obtain an estimate of the convergence rate. It is proved that fast (exponential) convergence of the gPC forward solution yields similarly fast (exponential) convergence of the posterior. The numerical strategy and the predicted convergence rates are then demonstrated on nonlinear inverse problems ofvarying smoothness and dimension." }

*Uncertainty quantification in chemical systems,*International Journal for Numerical Methods in Engineering, 80 (2009), pp. 789-814.

##### Uncertainty quantification in chemical systems

We demonstrate the use of multiwavelet spectral polynomial chaos techniques for uncertainty quantification in non-isothermal ignition of a methane–air system. We employ Bayesian inference for identifying the probabilistic representation of the uncertain parameters and propagate this uncertainty through the ignition process. We analyze the time evolution of moments and probability density functions of the solution. We also examine the role and significance of dependence among the uncertain parameters. We finish with a discussion of the role of non-linearity and the performance of the algorithm.

##### BibTeX entry

@article{art_8, author = "H. N. Najm and B. J. Debusschere and Y. M. Marzouk and S. Widmer and O. LeMaître", journal = "International Journal for Numerical Methods in Engineering", number = "6-7", pages = "789-814", title = "Uncertainty quantification in chemical systems", volume = "80", year = "2009", doi = "10.1002/nme.2551", abstract = "We demonstrate the use of multiwavelet spectral polynomial chaos techniques for uncertainty quantification in non-isothermal ignition of a methane--air system. We employ Bayesian inference for identifying the probabilistic representation of the uncertain parameters and propagate this uncertainty through the ignition process. We analyze the time evolution of moments and probability density functions of the solution. We also examine the role and significance of dependence among the uncertain parameters. We finish with a discussion of the role of non-linearity and the performance of the algorithm.", keywords = "uncertainty quantification; polynomial chaos; multiwavelets; chemistry; ignition" }

*Bayesian inference of spectral expansions for predictability assessment in stochastic reaction networks,*Journal of Computational and Theoretical Nanoscience, 6 (2009), pp. 2283-2297.

##### Bayesian inference of spectral expansions for predictability assessment in stochastic reaction networks

Stochastic reaction networks modeled as jump Markov processes serve as the main mathematical representation of biochemical phenomena in cells, particularly when the relevant molecule count is low, causing deterministic macroscale chemical reaction models to fail. Further, as there is mainly empirical knowledge about the rate parameters, parametric uncertainty analysis becomes very important. The conventional predictability tools for deterministic systems do not readily generalize to the stochastic setting. We use spectral polynomial chaos expansions to represent stochastic processes. Bayesian inference techniques with Markov chain Monte Carlo are used to find the best spectral representation of the system state, taking into account not only intrinsic stochastic noise but also parametric uncertainties. A likelihood-based adaptive domain decomposition is introduced and applied, in particular, for the cases when the parameter range includes deterministic bifurcations. We show that the adaptive multidomain polynomial chaos representation captures the correct system behavior for a benchmark bistable Schlögl model for a wide range of parameter variations.

##### BibTeX entry

@article{art_7, author = "K. Sargsyan and B. J. Debusschere and H. N. Najm and Y. M. Marzouk", journal = "Journal of Computational and Theoretical Nanoscience", number = "10", pages = "2283-2297", title = "Bayesian inference of spectral expansions for predictability assessment in stochastic reaction networks", volume = "6", year = "2009", doi = "10.1166/jctn.2009.1285", abstract = "Stochastic reaction networks modeled as jump Markov processes serve as the main mathematical representation of biochemical phenomena in cells, particularly when the relevant molecule count is low, causing deterministic macroscale chemical reaction models to fail. Further, as there is mainly empirical knowledge about the rate parameters, parametric uncertainty analysis becomes very important. The conventional predictability tools for deterministic systems do not readily generalize to the stochastic setting. We use spectral polynomial chaos expansions to represent stochastic processes. Bayesian inference techniques with Markov chain Monte Carlo are used to find the best spectral representation of the system state, taking into account not only intrinsic stochastic noise but also parametric uncertainties. A likelihood-based adaptive domain decomposition is introduced and applied, in particular, for the cases when the parameter range includes deterministic bifurcations. We show that the adaptive multidomain polynomial chaos representation captures the correct system behavior for a benchmark bistable Schlögl model for a wide range of parameter variations.", keywords = "uncertainty quantification; bayesian inference; polynomial chaos; stochastic reaction networks; domain decomposition; predictability" }

*Stochastic spectral methods for efficient Bayesian solution of inverse problems,*Journal of Computational Physics, 224 (2007), pp. 560-586.

##### Stochastic spectral methods for efficient Bayesian solution of inverse problems

We present a reformulation of the Bayesian approach to inverse problems, that seeks to accelerate Bayesian inference by using polynomial chaos (PC) expansions to represent random variables. Evaluation of integrals over the unknown parameter space is recast, more efficiently, as Monte Carlo sampling of the random variables underlying the PC expansion. We evaluate the utility of this technique on a transient diffusion problem arising in contaminant source inversion. The accuracy of posterior estimates is examined with respect to the order of the PC representation, the choice of PC basis, and the decomposition of the support of the prior. The computational cost of the new scheme shows significant gains over direct sampling.

##### BibTeX entry

@article{art_6, author = "Y. M. Marzouk and H. N. Najm and L. A. Rahn,", journal = "Journal of Computational Physics", number = "2", pages = "560-586", title = "Stochastic spectral methods for efficient Bayesian solution of inverse problems", volume = "224", year = "2007", doi = "10.1016/j.jcp.2006.10.010", abstract = "We present a reformulation of the Bayesian approach to inverse problems, that seeks to accelerate Bayesian inference by using polynomial chaos (PC) expansions to represent random variables. Evaluation of integrals over the unknown parameter space is recast, more efficiently, as Monte Carlo sampling of the random variables underlying the PC expansion. We evaluate the utility of this technique on a transient diffusion problem arising in contaminant source inversion. The accuracy of posterior estimates is examined with respect to the order of the PC representation, the choice of PC basis, and the decomposition of the support of the prior. The computational cost of the new scheme shows significant gains over direct sampling.", keywords = "Inverse problems; Bayesian inference; Polynomial chaos; Monte Carlo; Markov chain Monte Carlo; Spectral methods; Galerkin projection; Diffusive transport" }

*Vorticity structure and evolution in a transverse jet,*Journal of Fluid Mechanics, 575 (2007), pp. 267-305.

##### Vorticity structure and evolution in a transverse jet

Transverse jets arise in many applications, including propulsion, effluent dispersion, oil field flows, and V/STOL aerodynamics. This study seeks a fundamental, mechanistic understanding of the structure and evolution of vorticity in the transverse jet. We develop a high-resolution three-dimensional vortex simulation of the transverse jet at large Reynolds number and consider jet-to-crossflow velocity ratios r ranging from 5 to 10. A new formulation of vorticity-flux boundary conditions accounts for the interaction of channel wall vorticity with the jet flow immediately around the orifice. We demonstrate that the nascent jet shear layer contains not only azimuthal vorticity generated in the jet pipe, but wall-normal and azimuthal perturbations resulting from the jet–crossflow interaction. This formulation also yields analytical expressions for vortex lines in the near field as a function of $r$. Transformation of the cylindrical shear layer emanating from the orifice begins with axial elongation of its lee side to form sections of counter-rotating vorticity aligned with the jet trajectory. Periodic roll-up of the shear layer accompanies this deformation, creating complementary vortex arcs on the lee and windward sides of the jet. Counter-rotating vorticity then drives lee-side roll-ups in the windward direction, along the normal to the jet trajectory. Azimuthal vortex arcs of alternating sign thus approach each other on the windward boundary of the jet. Accordingly, initially planar material rings on the shear layer fold completely and assume an interlocking structure that persists for several diameters above the jet exit. Though the near field of the jet is dominated by deformation and periodic roll-up of the shear layer, the resulting counter-rotating vorticity is a pronounced feature of the mean field; in turn, the mean counter-rotation exerts a substantial influence on the deformation of the shear layer. Following the pronounced bending of the trajectory into the crossflow, we observe a sudden breakdown of near-field vortical structures into a dense distribution of smaller scales. Spatial filtering of this region reveals the persistence of counter-rotating streamwise vorticity initiated in the near field.

##### BibTeX entry

@article{art_5, author = "Y. M. Marzouk and A. F. Ghoniem", journal = "Journal of Fluid Mechanics", number = "1", pages = "267-305", title = "Vorticity structure and evolution in a transverse jet", volume = "575", year = "2007", doi = "10.1017/S0022112006004411", abstract = "Transverse jets arise in many applications, including propulsion, effluent dispersion, oil field flows, and V/STOL aerodynamics. This study seeks a fundamental, mechanistic understanding of the structure and evolution of vorticity in the transverse jet. We develop a high-resolution three-dimensional vortex simulation of the transverse jet at large Reynolds number and consider jet-to-crossflow velocity ratios r ranging from 5 to 10. A new formulation of vorticity-flux boundary conditions accounts for the interaction of channel wall vorticity with the jet flow immediately around the orifice. We demonstrate that the nascent jet shear layer contains not only azimuthal vorticity generated in the jet pipe, but wall-normal and azimuthal perturbations resulting from the jet--crossflow interaction. This formulation also yields analytical expressions for vortex lines in the near field as a function of $r$. Transformation of the cylindrical shear layer emanating from the orifice begins with axial elongation of its lee side to form sections of counter-rotating vorticity aligned with the jet trajectory. Periodic roll-up of the shear layer accompanies this deformation, creating complementary vortex arcs on the lee and windward sides of the jet. Counter-rotating vorticity then drives lee-side roll-ups in the windward direction, along the normal to the jet trajectory. Azimuthal vortex arcs of alternating sign thus approach each other on the windward boundary of the jet. Accordingly, initially planar material rings on the shear layer fold completely and assume an interlocking structure that persists for several diameters above the jet exit. Though the near field of the jet is dominated by deformation and periodic roll-up of the shear layer, the resulting counter-rotating vorticity is a pronounced feature of the mean field; in turn, the mean counter-rotation exerts a substantial influence on the deformation of the shear layer. Following the pronounced bending of the trajectory into the crossflow, we observe a sudden breakdown of near-field vortical structures into a dense distribution of smaller scales. Spatial filtering of this region reveals the persistence of counter-rotating streamwise vorticity initiated in the near field." }

*K-means clustering for optimal partitioning and dynamic load balancing of parallel hierarchical N-body simulations,*Journal of Computational Physics, 207 (2005), pp. 493-528.

##### K-means clustering for optimal partitioning and dynamic load balancing of parallel hierarchical N-body simulations

A number of complex physical problems can be approached through N-body simulation, from fluid flow at high Reynolds number to gravitational astrophysics and molecular dynamics. In all these applications, direct summation is prohibitively expensive for large N and thus hierarchical methods are employed for fast summation. This work introduces new algorithms, based on k-means clustering, for partitioning parallel hierarchical N-body interactions. We demonstrate that the number of particle–cluster interactions and the order at which they are performed are directly affected by partition geometry. Weighted k-means partitions minimize the sum of clusters’ second moments and create well-localized domains, and thus reduce the computational cost of N-body approximations by enabling the use of lower-order approximations and fewer cells. We also introduce compatible techniques for dynamic load balancing, including adaptive scaling of cluster volumes and adaptive redistribution of cluster centroids. We demonstrate the performance of these algorithms by constructing a parallel treecode for vortex particle simulations, based on the serial variable-order Cartesian code developed by Lindsay and Krasny [Journal of Computational Physics 172 (2) (2001) 879–907]. The method is applied to vortex simulations of a transverse jet. Results show outstanding parallel efficiencies even at high concurrencies, with velocity evaluation errors maintained at or below their serial values; on a realistic distribution of 1.2 million vortex particles, we observe a parallel efficiency of 98% on 1024 processors. Excellent load balance is achieved even in the face of several obstacles, such as an irregular, time-evolving particle distribution containing a range of length scales and the continual introduction of new vortex particles throughout the domain. Moreover, results suggest that k-means yields a more efficient partition of the domain than a global oct-tree.

##### BibTeX entry

@article{art_4, author = "Y. M. Marzouk and A. F. Ghoniem", journal = "Journal of Computational Physics", number = "2", pages = "493-528", title = "K-means clustering for optimal partitioning and dynamic load balancing of parallel hierarchical N-body simulations", volume = "207", year = "2005", doi = "10.1016/j.jcp.2005.01.021", abstract = "A number of complex physical problems can be approached through N-body simulation, from fluid flow at high Reynolds number to gravitational astrophysics and molecular dynamics. In all these applications, direct summation is prohibitively expensive for large N and thus hierarchical methods are employed for fast summation. This work introduces new algorithms, based on k-means clustering, for partitioning parallel hierarchical N-body interactions. We demonstrate that the number of particle--cluster interactions and the order at which they are performed are directly affected by partition geometry. Weighted k-means partitions minimize the sum of clusters’ second moments and create well-localized domains, and thus reduce the computational cost of N-body approximations by enabling the use of lower-order approximations and fewer cells. We also introduce compatible techniques for dynamic load balancing, including adaptive scaling of cluster volumes and adaptive redistribution of cluster centroids. We demonstrate the performance of these algorithms by constructing a parallel treecode for vortex particle simulations, based on the serial variable-order Cartesian code developed by Lindsay and Krasny [Journal of Computational Physics 172 (2) (2001) 879--907]. The method is applied to vortex simulations of a transverse jet. Results show outstanding parallel efficiencies even at high concurrencies, with velocity evaluation errors maintained at or below their serial values; on a realistic distribution of 1.2 million vortex particles, we observe a parallel efficiency of 98% on 1024 processors. Excellent load balance is achieved even in the face of several obstacles, such as an irregular, time-evolving particle distribution containing a range of length scales and the continual introduction of new vortex particles throughout the domain. Moreover, results suggest that k-means yields a more efficient partition of the domain than a global oct-tree.", keywords = "k-means clustering; Treecode; N-body problems; Hierarchical methods; Parallel processing; Load balancing; Particle methods; Vortex methods; Three-dimensional flow; Transverse jet" }

*Toward a flame embedding model for turbulent combustion simulation,*AIAA Journal, 41 (2003), pp. 641-652.

##### Toward a flame embedding model for turbulent combustion simulation

Combustion in turbulent flows may take the form of a thin flame wrapped around vortical structures. For this regime, the flame embedding approach seeks to decouple computations of the outer nonreacting flow and the combustion zone by discretizing the flame surface into a number of elemental flames, each incorporating the local impact of unsteady flow-flame interaction. An unsteady strained laminar flame solver, based on a boundary-layer approximation of combustion in a time-dependent stagnation-point potential flow, is proposed as an elemental flame model. To validate the concept, two-dimensional simulations of premixed flame-vortex interactions are performed for a matrix of vortex strengths and length scales, and a section of the flame is selected for comparison with the flame embedding model results. Results show that using the flame leading-edge strain rate gives reasonable agreement in the cases of low strain rate and weak strain rate gradient within the flame structure. This agreement deteriorates substantially when both are high. We propose two different schemes, both based on averaging the strain rate across the flame structure, and demonstrate that agreement between the one-dimensional model and the two-dimensional simulation greatly improves when the actual strain rate at the reaction zone of the one-dimensional flame is made to match that of the two-dimensional flame.

##### BibTeX entry

@article{art_3, author = "Y. M. Marzouk and A. F. Ghoniem and H. N. Najm", journal = "AIAA Journal", number = "4", pages = "641-652", title = "Toward a flame embedding model for turbulent combustion simulation", volume = "41", year = "2003", abstract = "Combustion in turbulent flows may take the form of a thin flame wrapped around vortical structures. For this regime, the flame embedding approach seeks to decouple computations of the outer nonreacting flow and the combustion zone by discretizing the flame surface into a number of elemental flames, each incorporating the local impact of unsteady flow-flame interaction. An unsteady strained laminar flame solver, based on a boundary-layer approximation of combustion in a time-dependent stagnation-point potential flow, is proposed as an elemental flame model. To validate the concept, two-dimensional simulations of premixed flame-vortex interactions are performed for a matrix of vortex strengths and length scales, and a section of the flame is selected for comparison with the flame embedding model results. Results show that using the flame leading-edge strain rate gives reasonable agreement in the cases of low strain rate and weak strain rate gradient within the flame structure. This agreement deteriorates substantially when both are high. We propose two different schemes, both based on averaging the strain rate across the flame structure, and demonstrate that agreement between the one-dimensional model and the two-dimensional simulation greatly improves when the actual strain rate at the reaction zone of the one-dimensional flame is made to match that of the two-dimensional flame.", keywords = "Two dimensional model; One dimensional model; Premixed flame; Finite difference method; Numerical simulation; Flame structure; Turbulent flame; Combustion" }

*Dynamic response of strained premixed flames to equivalence ratio gradients,*Proceedings of the Combustion Institute, 28 (2000), pp. 1859-1866.

##### Dynamic response of strained premixed flames to equivalence ratio gradients

Premixed flames encounter gradients of mixture equivalence ratio in stratified charge engines, lean premixed gas-turbine engines, and a variety of other applications. In cases for which the scales—spatial or temporal—of fuel concentration gradients in the reactants are comparable to flame scales, changes in burning rate, flammability limits, and flame structure have been observed. This paper uses an unsteady strained flame in the stagnation point configuration to examine the effect of temporal gradients on combustion in a premixed methane/air mixture. An inexact Newton backtracking method, coupled with a preconditioned Krylov subspace iterative solver, was used to improve the efficiency of the numerical solution and expand its domain of convergence in the presence of detailed chemistry. Results indicate that equivalence ratio variations with timescales lower than 10 ms have significant effects on the burning process, including reaction zone broadening, burning rate enhancement, and extension of the flammability limit toward learner mixtures. While the temperature of a flame processing a stoichiometric-to-lean equivalence ratio gradient decreased slightly within the front side of the reaction zone, radical concentrations remained elevated over the entire flame structure. These characteristics are linked to a feature reminiscent of “back-supported” flames—flames in which a stream of products resulting from burning at higher equivalence ratio is continuously supplied to lower equivalence ratio reactants. The relevant feature is the establishment of a positive temperature gradient on the products side of the flame which maintains the temperature high enough and the radical concentration sufficient to sustain combustion there. Unsteadiness in equivalence ratio produces similar gradients within the flame structure, thus compensating for the change in temperature at the leading edge of the reaction zone and accounting for an observed “flame inertia”. For sufficiently large equivalence ratio gradients, a flame starting in a stoichiometric mixture can burn through a very lean one by taking advantage of this mechanism.

##### BibTeX entry

@article{art_2, author = "Y. M. Marzouk and A. F. Ghoniem and H. N. Najm", journal = "Proceedings of the Combustion Institute", number = "2", pages = "1859-1866", title = "Dynamic response of strained premixed flames to equivalence ratio gradients", volume = "28", year = "2000", doi = "10.1016/S0082-0784(00)80589-5", abstract = "Premixed flames encounter gradients of mixture equivalence ratio in stratified charge engines, lean premixed gas-turbine engines, and a variety of other applications. In cases for which the scales—spatial or temporal—of fuel concentration gradients in the reactants are comparable to flame scales, changes in burning rate, flammability limits, and flame structure have been observed. This paper uses an unsteady strained flame in the stagnation point configuration to examine the effect of temporal gradients on combustion in a premixed methane/air mixture. An inexact Newton backtracking method, coupled with a preconditioned Krylov subspace iterative solver, was used to improve the efficiency of the numerical solution and expand its domain of convergence in the presence of detailed chemistry. Results indicate that equivalence ratio variations with timescales lower than 10 ms have significant effects on the burning process, including reaction zone broadening, burning rate enhancement, and extension of the flammability limit toward learner mixtures. While the temperature of a flame processing a stoichiometric-to-lean equivalence ratio gradient decreased slightly within the front side of the reaction zone, radical concentrations remained elevated over the entire flame structure. These characteristics are linked to a feature reminiscent of “back-supported” flames—flames in which a stream of products resulting from burning at higher equivalence ratio is continuously supplied to lower equivalence ratio reactants. The relevant feature is the establishment of a positive temperature gradient on the products side of the flame which maintains the temperature high enough and the radical concentration sufficient to sustain combustion there. Unsteadiness in equivalence ratio produces similar gradients within the flame structure, thus compensating for the change in temperature at the leading edge of the reaction zone and accounting for an observed “flame inertia”. For sufficiently large equivalence ratio gradients, a flame starting in a stoichiometric mixture can burn through a very lean one by taking advantage of this mechanism." }

*Asymmetric autocorrelation function to resolve directional ambiguity in PIV images,*Experiments in Fluids, 25 (1998), pp. 401-408.

##### Asymmetric autocorrelation function to resolve directional ambiguity in PIV images

Autocorrelation of a double-exposed image, unlike cross-correlation between two images, produces a correlation function that is symmetric about the origin. Thus, while it is possible to calculate the speed and direction of tracer particles in a particle image velocimetry (PIV) image using autocorrelation, it is impossible to tell whether the velocity is in the positive or negative direction. This ambiguity can be resolved by spatially shifting one exposure relative to the next or labeling exposures with color or polarization, but the complexity and limitations of these methods can be prohibitive. It is, however, possible to resolve the sign of the velocity from a triple-exposed image using unequal time intervals between exposures. Triple-exposed images, like double-exposed images, correlate symmetrically about zero. The directional ambiguity, however, can be resolved by calculating the probability that the three exposures occur in a specific temporal order; that is, by assuming that the correlation has a specific sign and testing to see if the assumption is correct. Traditional spectral and statistical correlation techniques are unable to accomplish this. Presented herein is a computationally efficient asymmetric correlation function that is able to differentiate the temporal order of triple exposed images. Included is a discussion of the limitations of this function and of difficulties in experimental implementation.

##### BibTeX entry

@article{art_1, author = "Y. M. Marzouk and D. P. Hart", journal = "Experiments in Fluids", number = "5-6", pages = "401-408", title = "Asymmetric autocorrelation function to resolve directional ambiguity in PIV images", volume = "25", year = "1998", doi = "10.1007/s003480050247", abstract = "Autocorrelation of a double-exposed image, unlike cross-correlation between two images, produces a correlation function that is symmetric about the origin. Thus, while it is possible to calculate the speed and direction of tracer particles in a particle image velocimetry (PIV) image using autocorrelation, it is impossible to tell whether the velocity is in the positive or negative direction. This ambiguity can be resolved by spatially shifting one exposure relative to the next or labeling exposures with color or polarization, but the complexity and limitations of these methods can be prohibitive. It is, however, possible to resolve the sign of the velocity from a triple-exposed image using unequal time intervals between exposures. Triple-exposed images, like double-exposed images, correlate symmetrically about zero. The directional ambiguity, however, can be resolved by calculating the probability that the three exposures occur in a specific temporal order; that is, by assuming that the correlation has a specific sign and testing to see if the assumption is correct. Traditional spectral and statistical correlation techniques are unable to accomplish this. Presented herein is a computationally efficient asymmetric correlation function that is able to differentiate the temporal order of triple exposed images. Included is a discussion of the limitations of this function and of difficulties in experimental implementation." }

**No articles found matching filter.**

##### Announcements

**September 2020**

Congratulations to Michael Brennan, Daniele Bigoni, Olivier Zahm, and Alessio Spantini on their paper accepted for oral presentation at NeurIPS 2020!

**April 2020**

Congratulations to Aimee Maurais for winning an NSF Graduate Research Fellowship!

**April 2020**

Congratulations to Robert Ren for winning an NSF Graduate Research Fellowship!

**March 2020**

Congratulations to Antoni Musolas for successfully defending his PhD thesis!

**December 2019**

Congratulations to Jon Paul Janet for successfully defending his PhD thesis!

**November 2019**

Congratulations to Jakob Zech for accepting a faculty position in the Institute of Applied Mathematics at the University of Heidelberg!

**September 2019**

Congratulations to Ben Zhang for winning the MathWorks Fellowship from the MIT School of Engineering.