APL Home

Campus Map

Brad Bell

Senior Principal Mathematician





Research Interests

Optimization, Statistics, Numerical Methods, Software Engineering


Dr. Bell's graduate research was in optimization. Since then much of his research has concerned applications of optimization and numerical methods to data analysis. This includes procedures for properly weighting data from various sources, Kalman smoothing algorithms for the nonlinear case, and procedures for estimating signal arrival times.
He designed a compartmental modeling program that is used by researchers around the world to analyze the kinetics of drugs and naturally occurring compounds in humans and other animals. He is currently working on algorithms that estimate the variation between realizations of scientific models, with special emphasis on modeling pharmacokinetics. He joined APL-UW in 1978.


B.A. Mathematics and Physics, Saint Lawrence University, 1973

M.A. Mathematics, University of Washington, 1976

Ph.D. Mathematics, University of Washington, 1984


2000-present and while at APL-UW

The connection between Bayesian estimation of a Gaussian random field and RKHS

Aravkin, A.Y., B.M. Bell, J.V. Burke, and G. Pillonetto, "The connection between Bayesian estimation of a Gaussian random field and RKHS," IEEE Trans. Neural Networks Learn. Syst., 26, 15-18, doi:10.1109/TNNLS.2014.2337939, 2015.

More Info

1 Jul 2015

Reconstruction of a function from noisy data is key in machine learning and is often formulated as a regularized optimization problem over an infinite-dimensional reproducing kernel Hilbert space (RKHS). The solution suitably balances adherence to the observed data and the corresponding RKHS norm. When the data fit is measured using a quadratic loss, this estimator has a known statistical interpretation. Given the noisy measurements, the RKHS estimate represents the posterior mean (minimum variance estimate) of a Gaussian random field with covariance proportional to the kernel associated with the RKHS. In this brief, we provide a statistical interpretation when more general losses are used, such as absolute value, Vapnik or Huber. Specifically, for any finite set of sampling locations (that includes where the data were collected), the maximum a posteriori estimate for the signal samples is given by the RKHS estimate evaluated at the sampling locations. This connection establishes a firm statistical foundation for several stochastic approaches used to estimate unknown regularization parameters. To illustrate this, we develop a numerical scheme that implements a Bayesian estimator with an absolute value loss. This estimator is used to learn a function from measurements contaminated by outliers.

Distributed Kalman smoothing in static Bayesian networks

Pillonetto, G., B.M. Bell, and S. Del Favero, "Distributed Kalman smoothing in static Bayesian networks," Automatica, 49, 1001-1011, doi:10.1016/j.automatica.2013.01.016, 2013.

More Info

1 Apr 2013

This paper considers the state-space smoothing problem in a distributed fashion. In the scenario of sensor networks, we assume that the nodes can be ordered in space and have access to noisy measurements relative to different but correlated states. The goal of each node is to obtain the minimum variance estimate of its own state conditional on all the data collected by the network using only local exchanges of information. We present a cooperative smoothing algorithm for Gauss–Markov linear models and provide an exact convergence analysis for the algorithm, also clarifying its advantages over the Jacobi algorithm. Extensions of the numerical scheme able to perform field estimation using a grid of sensors are also derived.

An l1-Laplace robust Kalman smoother

Bell, B., G. Pillonetto, and J. Burke, "An l1-Laplace robust Kalman smoother," IEEE Trans. Autom. Control, 56, 2898-2911, doi: 10.1109/TAC.2011.2141430, 2011.

More Info

1 Dec 2011

Robustness is a major problem in Kalman filtering and smoothing that can be solved using heavy tailed distributions; e.g., l1-Laplace. This paper describes an algorithm for finding the maximum a posteriori (MAP) estimate of the Kalman smoother for a nonlinear model with Gaussian process noise and l1-Laplace observation noise. The algorithm uses the convex composite extension of the Gauss-Newton method. This yields convex programming subproblems to which an interior point path-following method is applied. The number of arithmetic operations required by the algorithm grows linearly with the number of time points because the algorithm preserves the underlying block tridiagonal structure of the Kalman smoother problem. Excellent fits are obtained with and without outliers, even though the outliers are simulated from distributions that are not l1-Laplace. It is also tested on actual data with a nonlinear measurement model for an underwater tracking experiment. The l1-Laplace smoother is able to construct a smoothed fit, without data removal, from data with very large outliers.

More Publications

Learning using state space kernel machines

Aravkin, A., B. Bell, J.V. Burke, and G. Pillonetto, "Learning using state space kernel machines," In Proc., 18th World Congress of the International Federation of Automatic Control, 28 August - 2 September, Milan, Italy, doi: 10.3182/20110828-6-1002.02726 (IFAC/Elsevier, 2011).

More Info

28 Aug 2011

Reconstruction of a function from noisy data is often formulated as a regularized optimization problem whose solution closely matches an observed data set and also has a small reproducing kernel Hilbert space norm. The loss functions that measure agreement between the data and the function are often smooth (e.g. the least squares penalty), but non-smooth loss functions are of interest in many applications. Using the least squares penalty, large machine learning problems with kernels amenable to a stochastic state space representation (which we call state space kernel machines) have been solved using a Kalman smoothing approach. In this paper we extend this approach for state space kernel machines to the Vapnik penalty, a particularly important non-smooth penalty that is robust to outlier noise in the measurements and induces a sparse representation of the reconstructed function. We exploit the structure of such models using interior point methods that efficiently solve the functional recovery problem with the Vapnik penalty, and demonstrate the effectiveness of the method on a numerical experiment. The computational effort of our approach scales linearly with the number of data points.

PhilSea10 APL-UW Cruise Report: 5-29 May 2010

Andrew, R.K., J.A. Mercer, B.M. Bell, A.A. Ganse, L. Buck, T. Wen, and T.M. McGinnis, "PhilSea10 APL-UW Cruise Report: 5-29 May 2010," APL-UW TR 1001, October 2010.

More Info

30 Oct 2010

A team from the Applied Physics Laboratory of the University of Washington (APL-UW) conducted underwater sound propagation exercises from 5 to 29 May 2010 aboard the R/V Roger Revelle in the Philippine Sea. This research cruise was part of a larger multi-cruise, multi-institution effort, the PhilSea10 Experiment, sponsored by the Office of Naval Research, to investigate the deterministic and stochastic properties of long-range deep ocean sound propagation in a region of energetic oceanographic processes. The primary objective of the APL-UW cruise was to transmit acoustic signals from electro-acoustic transducers suspended from the R/V Roger Revelle to an autonomous distributed vertical line array (DVLA) deployed in March by a team from the Scripps Institution of Oceanography (SIO.) The DVLA will be recovered in March 2011.

Two transmission events took place from a location designated SS500, approximately 509 km to the southeast of the DVLA: a 54-hr event using the HX554 transducer at 1000 m depth, and a 55-hr event using the MP200/TR1446 "multiport" transducer at 1000 m depth. A third event took place towing the HX554 at a depth of 150 m at roughly 1–2 kt for 10 hr on a radial line 25–43 km away from the DVLA. All acoustic events broadcasted low-frequency (61–300 Hz) m-sequences continuously except for a short gap each hour to synchronize transmitter computer files. An auxiliary cruise objective was to obtain high temporal and spatial resolution measurements of the sound speed field between SS500 and the DVLA.

Two methods were used: tows of an experimental "CTD chain" (TCTD) and periodic casts of the ship's CTD. The TCTD consisted of 88 CTD sensors on an inductive seacable 800 m long, and was designed to sample the water column to 500 m depth from all sensors every few seconds. Two tows were conducted, both starting near SS500 and following the path from SS500 towards the DVLA, for distances of 93 km and 124 km. Only several dozen sensors responded during sampling. While the temperature data appear reasonable, only about one-half the conductivity measurements and none of the pressure measurements can be used. Ship CTD casts were made to 1500 m depth every 10 km, with every fifth cast to full ocean depth.

An inequality constrained nonlinear Kalman-Bucy smoother by interior point likelihood maximization

Bell, B.M., J.V. Burke, G. Pillonetto, "An inequality constrained nonlinear Kalman-Bucy smoother by interior point likelihood maximization," Automatica, 45, 25-33, doi:10.1016/j.automatica.2008.05.029, 2009.

More Info

1 Jan 2009

Kalman–Bucy smoothers are often used to estimate the state variables as a function of time in a system with stochastic dynamics and measurement noise. This is accomplished using an algorithm for which the number of numerical operations grows linearly with the number of time points. All of the randomness in the model is assumed to be Gaussian. Including other available information, for example a bound on one of the state variables, is non trivial because it does not fit into the standard Kalman–Bucy smoother algorithm. In this paper we present an interior point method that maximizes the likelihood with respect to the sequence of state vectors satisfying inequality constraints. The method obtains the same decomposition that is normally obtained for the unconstrained Kalman–Bucy smoother, hence the resulting number of operations grows linearly with the number of time points. We present two algorithms, the first is for the affine case and the second is for the nonlinear case. Neither algorithm requires the optimization to start at a feasible sequence of state vector values. Both the unconstrained affine and unconstrained nonlinear Kalman–Bucy smoother are special cases of the class of problems that can be handled by these algorithms.

Algorithmic differentiation of implicit functions and optimal values

Bell, B.M., and J.V. Burke, "Algorithmic differentiation of implicit functions and optimal values," in Advances in Automatic Differentiation, eds., C. Bischof, H. Bucker, P. Hovland, U. Naumann, and J. Utke, 67-78, doi:10.1007/978-3-540-68942-3_7 (Springer, 2008).

31 Dec 2008

Approximating the Bayesian inverse for nonlinear dynamical systems

Bell, B.M., and G. Pillonetto, "Approximating the Bayesian inverse for nonlinear dynamical systems," J. Phys.: Conf. Ser., 124, doi:10.1088/1742-6596/012009, 2008.

More Info

21 Aug 2008

We are given a sequence of measurement vectors and a possibly nonlinear relation to a corresponding sequence of state vectors. We are also given a possibly nonlinear model for the dynamics of the state vectors. The goal is to estimate (invert) for the state vectors when there is noise in both the measurements and the dynamics of the state. In general, obtaining the minimum variance (Bayesian) estimate of the state vectors is difficult because it requires evaluations of high dimensional integrals with no closed analytic form. We use a block tridiagonal Cholesky algorithm to simulate from the Laplace approximation for the posterior of the state vectors. These simulations are used as a proposal density for the random-walk Metropolis algorithm to obtain samples from the actual posterior. This provides a means to approximate the minimum variance estimate, as well as confidence intervals, for the state vector sequence. Simulations of a fed-batch bio-reactor model are used to demonstrate that this approach obtains better estimates and confidence intervals than the iterated Kalman smoother.

Optimal smoothing of non-linear dynamic systems via Monte Carlo Markov chains

Pillonetto, G., and B.M. Bell, "Optimal smoothing of non-linear dynamic systems via Monte Carlo Markov chains," Automatica, 44, 1676-1685, 10.1016/j.automatica.2007.10.028, 2008.

More Info

1 Jul 2008

We consider the smoothing problem of estimating a sequence of state vectors given a nonlinear state space model with additive white Gaussian noise, and measurements of the system output. The system output may also be nonlinearly related to the system state. Often, obtaining the minimum variance state estimates conditioned on output data is not analytically intractable. To tackle this difficulty, a Markov chain Monte Carlo technique is presented. The proposal density for this method efficiently draws samples from the Laplace approximation of the posterior distribution of the state sequence given the measurement sequence. This proposal density is combined with the Metropolis-Hastings algorithm to generate realizations of the state sequence that converges to the proper posterior distribution. The minimum variance estimate and confidence intervals are approximated using these realizations. Simulations of a fed-batch bioreactor model are used to demonstrate that the proposed method can obtain significantly better estimates than the iterated Kalman-Bucy smoother.

Bayes and empirical Bayes semi-blind deconvolution using eigenfunctions of a prior covariance

Pilonetto, G., and B.M. Bell, "Bayes and empirical Bayes semi-blind deconvolution using eigenfunctions of a prior covariance," Automatica, 43, 1698-1712, doi:10.1016/j.automatica.2007.02.025, 2007.

More Info

1 Oct 2007

We consider the semi-blind deconvolution problem; i.e., estimating an unknown input function to a linear dynamical system using a finite set of linearly related measurements where the dynamical system is known up to some system parameters. Without further assumptions, this problem is often ill-posed and ill-conditioned. We overcome this difficulty by modeling the unknown input as a realization of a stochastic process with a covariance that is known up to some finite set of covariance parameters. We first present an empirical Bayes method where the unknown parameters are estimated by maximizing the marginal likelihood/posterior and subsequently the input is reconstructed via a Tikhonov estimator (with the parameters set to their point estimates). Next, we introduce a Bayesian method that recovers the posterior probability distribution, and hence the minimum variance estimates, for both the unknown parameters and the unknown input function. Both of these methods use the eigenfunctions of the random process covariance to obtain an efficient representation of the unknown input function and its probability distributions. Simulated case studies are used to test the two methods and compare their relative performance.

Estimating in vitro mitochondrial oxygen consumption during muscle contraction and recovery: A novel approach that accounts for diffusion

Dash, R.K., B.M. Bell, M.J. Kushmerick, and P. Vicini, "Estimating in vitro mitochondrial oxygen consumption during muscle contraction and recovery: A novel approach that accounts for diffusion," Ann. Biomed. Eng., 33, 343-355, 2005

More Info

30 Mar 2005

A deconvolution algorithm, based on a Bayesian statistical framework and smoothing spline technique, is applied to reconstructing input functions from noisy measurements in biological systems. Deconvolution is usually ill-posed. However, placing a Bayesian prior distribution on the input function can make the problem well-posed. Using this algorithm and a computational model of diffusional oxygen transport in an approximately cylindrical muscle (about 0.5-mm diameter and 10-mm long mouse leg muscle), the time course of muscle oxygen uptake and mitochondrial oxygen consumption, both during isometric twitch contractions (at various frequencies) and the recovery period, is estimated from polarographic measurements of oxygen concentration on the muscle surface. An important feature of our experimental protocol is the availability of data for the apparatus characteristics. From these time courses, the actual mitochondrial consumption rates during resting and exercise states can be estimated. Mitochondrial oxygen consumption rate increased during stimulation to a maximum steady state value approximately five times of the resting value of 0.63 nmol/s/g wet weight for the stimulation conditions studied. Diffusion slowed the kinetic responses to the contraction but not the steady state fluxes during the stimulation interval.

Contributions of the input signal and prior activation history to the discharge behaviour of rat motoneurones

Powers, R.K., Y. Dai, B.M. Bell, D.B. Percival, and M.D. Binder, "Contributions of the input signal and prior activation history to the discharge behaviour of rat motoneurones," J. Physiol., 562, 707-724, doi:10.1113/jphysiol.2004.069039, 2005

1 Feb 2005

Estimating parameters and stochastic functions of one variable using nonlinear measurement models

Bell, B.M., and G. Pillonetto, "Estimating parameters and stochastic functions of one variable using nonlinear measurement models," Inverse Probl., 20, 627-646, doi:10.1088/0266-5611/20/3/001, 2004.

More Info

5 Mar 2004

Estimating an unknown function of one variable from a finite set of measurements is an ill-posed inverse problem. Placing a Bayesian prior on a function space is one way to make this problem well-posed. This problem can turn out well-posed even if the relationship between the unknown function and the measurements, as well as the function space prior, contains unknown parameters. We present a method for estimating the unknown parameters by maximizing an approximation of the marginal likelihood where the unknown function has been integrated out. This is an extension of marginal likelihood estimators for the regularization parameter because we allow for a nonlinear relationship between the unknown function and the measurements. The estimate of the function is then obtained by maximizing its a posteriori probability density function given the parameters and the data. We present a computational method that uses eigenfunctions to represent the function space. The continuity properties of the function estimate are characterized. Proofs of the convergence of the method are included. The importance of allowing for a nonlinear transformation is demonstrated by a stochastic sum of exponentials example.

Deconvolution of non-stationary physical signals: A smooth variance model for insulin secretion rate

Pillonetto, G., and B.M. Bell, "Deconvolution of non-stationary physical signals: A smooth variance model for insulin secretion rate," Inverse Probl., 20, 367-383, doi: 10.1088/0266-5611/20/2/004, 2004.

More Info

16 Jan 2004

Deconvolution is the process of estimating a system's input using measurements of a causally related output where the relationship between the input and output is known and linear. Regularization parameters are used to balance smoothness of the estimated input with accuracy of the measurement values. In this paper we present a maximum marginal likelihood method for estimating unknown regularization (and other) parameters used during deconvolution of dynamical systems. Our computational approach uses techniques that were developed for Kalman filters and smoothers. As an example application we consider estimating insulin secretion rate (ISR) following an intravenous glucose stimulus. This procedure is referred to in the medical literature as an intravenous glucose tolerance test (IVGTT). This estimation problem is difficult because ISR is a strongly non-stationary signal; it presents a fast peak in the first minutes of the experiment, followed by a smoother release. We use three regularization parameters to define a smooth model for ISR variance. This model takes into account the rapid variation of ISR during the beginning of an IVGTT and its slower variation as time progresses. Simulations are used to assess marginal likelihood estimation of these regularization parameters as well as of other parameters in the system. Simulations are also used to compare our model for ISR variance with other stochastic ISR models. In addition, we apply maximum marginal likelihood and our ISR variance model to real data that have previous ISR estimation results reported in the literature.

Approximating the marginal likelihood estimate for models with random parameters

Bell, B.M., "Approximating the marginal likelihood estimate for models with random parameters," Appl. Math. Comput., 119, 57-75, 2001.

More Info

25 Mar 2001

Often a model for the mean and variance of a measurement set is naturally expressed in terms of both deterministic and random parameters. Each of the deterministic parameters has one fixed value while the random parameters come from a distribution of values. We restrict our attention to the case where the random parameters and the measurement error have a Gaussian distribution. In this case, the joint likelihood of the data and random parameters is an extended least squares function. The likelihood of the data alone is the integral of this extended least squares function with respect to the random parameters. This is the likelihood that we would like to optimize, but we do not have a closed form expression for the integral. We use Laplace's method to obtain an approximation for the likelihood of the data alone. Maximizing this approximation is less computationally demanding than maximizing the integral expression, but this yields a different estimator. In addition, evaluation of the approximation requires second derivatives of the original model functions. If we were to use this approximation as our objective function, evaluation of the derivative of the objective would require third derivatives of the original model functions. We present modified approximations that are expressed using only values of the original model functions. Evaluation of the derivative of the modified approximations only requires first derivatives of the original model functions. We use Monte Carlo techniques to approximate the difference between an arbitrary estimator and the estimator that maximizes the likelihood of the data alone. In addition, we approximate the information matrix corresponding to the estimator that maximizes the likelihood of the data alone.

The marginal likelihood for parameters in a discrete Gauss-Markov process

Bell, B.M., "The marginal likelihood for parameters in a discrete Gauss-Markov process," IEEE Trans. Signal Process., 48, 870-873, 2000.

More Info

1 Mar 2000

We use Laplace's method to approximate the marginal likelihood for parameters in a Gauss-Markov process. This approximation requires the determinant of a matrix whose dimensions are equal to the number of state variables times the number of time points. We reduce this to sequential evaluation of determinants and inverses of smaller matrices, we show this is a numerically stable method.


SAAM II Software System

Record of Invention Number: 43605

Brad Bell, David Foster, Walt Reissig, Paolo Vicini, Hellmut Golde


22 Apr 2011

Weighted Nonlinear Least Squares Optimizer (WNLSQ)

Record of Invention Number: 1568PR-Reg-0002

Brad Bell, Gail Badner


11 Jun 2009

Acoustics Air-Sea Interaction & Remote Sensing Center for Environmental & Information Systems Center for Industrial & Medical Ultrasound Electronic & Photonic Systems Ocean Engineering Ocean Physics Polar Science Center