## Publications

109 results found

Zhao S, Van Dyk D, Imai K, Propensity-score based methods for causal inference in observational studies with non-binary treatments, *Statistical Methods in Medical Research*, ISSN: 0962-2802

Propensity score methods are a part of the standard toolkit for applied researchers who wish to ascertain causaleffects from observational data. While they were originally developed for binary treatments, several researchershave proposed generalizations of the propensity score methodology for non-binary treatment regimes. Suchextensions have widened the applicability of propensity score methods and are indeed becoming increasinglypopular themselves. In this article, we closely examine two methods that generalize propensity scores in thisdirection, namely, the propensity function (pf), and the generalized propensity score (gps), along with twoextensions of thegpsthat aim to improve its robustness. We compare the assumptions, theoretical properties,and empirical performance of these methods. On a theoretical level, thegpsand its extensions are advantageousin that they are designed to estimate the full dose response function rather than the average treatment effectthat is estimated with thepf. We comparegpswith a newpfmethod, both of which estimate the doseresponse function. We illustrate our findings and proposals through simulation studies, including one based onan empirical study about the effect of smoking on healthcare costs. While our proposedpf-based estimatorpreforms well, we generally advise caution in that all available methods can be biased by model misspecificationand extrapolation.

Algeri S, Dyk DAV, Testing one hypothesis multiple Times: The multidimensional case, *Journal of Computational and Graphical Statistics*, ISSN: 1061-8600

The identification of new rare signals in data, the detection of a suddenchange in a trend, and the selection of competing models, are among the mostchallenging problems in statistical practice. These challenges can be tackledusing a test of hypothesis where a nuisance parameter is present only under thealternative, and a computationally efficient solution can be obtained by the"Testing One Hypothesis Multiple times" (TOHM) method. In the one-dimensionalsetting, a fine discretization of the space of the non-identifiable parameteris specified, and a global p-value is obtained by approximating thedistribution of the supremum of the resulting stochastic process. In thispaper, we propose a computationally efficient inferential tool to perform TOHMin the multidimensional setting. Here, the approximations of interest typicallyinvolve the expected Euler Characteristics (EC) of the excursion set of theunderlying random field. We introduce a simple algorithm to compute the EC inmultiple dimensions and for arbitrary large significance levels. This leads toan highly generalizable computational tool to perform inference undernon-standard regularity conditions.

Algeri S, Van Dyk D, Testing one hypothesis multiple times, *Statistica Sinica*, ISSN: 1017-0405

In applied settings, tests of hypothesis where a nuisance parameteris only identifiable under the alternative often reduces into one of Testing OneHypothesis Multiple times (TOHM). Specifically, a fine discretization of the spaceof the non-identifiable parameter is specified, and the null hypothesis is testedagainst a set of sub-alternative hypothesis, one for each point of the discretization.The resulting sub-test statistics are then combined to obtain a global p-value.In this paper, we discuss a computationally efficient inferential tool to performTOHM under stringent significance requirements, such as those typically requiredin the physical sciences, (e.g., p-value < 10−7). The resulting procedure leadsto a generalized approach to perform inference under non-standard conditions,including non-nested models comparisons.

Stampoulis V, Van Dyk D, Kashyap VL,
et al., 2019, Multidimensional data driven classification of emission-line galaxies, *Monthly Notices of the Royal Astronomical Society*, Vol: 485, Pages: 1085-1102, ISSN: 0035-8711

We propose a new soft clustering scheme for classifying galaxies in different activity classes using simultaneously four emission-line ratios: log ([NII]/H α), log ([SII]/H α), log ([OI]/H α), and log ([OIII]/H β). We fit 20 multivariate Gaussian distributions to the four-dimensional distribution of these lines obtained from the Sloan Digital Sky Survey in order to capture local structures and subsequently group the multivariate Gaussian distributions to represent the complex multidimensional structure of the joint distribution of galaxy spectra in the four-dimensional line ratio space. The main advantages of this method are the use of all four optical-line ratios simultaneously and the adoption of a clustering scheme. This maximizes the use of the available information, avoids contradicting classifications, and treats each class as a distribution resulting in soft classification boundaries and providing the probability for an object to belong to each class. We also introduce linear multidimensional decision surfaces using support vector machines based on the classification of our soft clustering scheme. This linear multidimensional hard clustering technique shows high classification accuracy with respect to our soft clustering scheme.

Chen Y, Meng XL, Wang X,
et al., 2019, Calibration concordance for astronomical instruments via multiplicative shrinkage, *Journal of the American Statistical Association*, Vol: 114, Pages: 1018-1037, ISSN: 0162-1459

Calibration data are often obtained by observing several well-understood objects simultaneously with multiple instruments, such as satellites for measuring astronomical sources. Analyzing such data and obtaining proper concordance among the instruments is challenging when the physical source models are not well understood, when there are uncertainties in “known” physical quantities, or when data quality varies in ways that cannot be fully quantified. Furthermore, the number of model parameters increases with both the number of instruments and the number of sources. Thus, concordance of the instruments requires careful modeling of the mean signals, the intrinsic source differences, and measurement errors. In this article, we propose a log-Normal model and a more general log-t model that respect the multiplicative nature of the mean signals via a half-variance adjustment, yet permit imperfections in the mean modeling to be absorbed by residual variances. We present analytical solutions in the form of power shrinkage in special cases and develop reliable Markov chain Monte Carlo algorithms for general cases, both of which are available in the Python module CalConcordance. We apply our method to several datasets including a combination of observations of active galactic nuclei (AGN) and spectral line emission from the supernova remnant E0102, obtained with a variety of X-ray telescopes such as Chandra, XMM- Newton, Suzaku, and Swift. The data are compiled by the International Astronomical Consortium for High Energy Calibration. We demonstrate that our method provides helpful and practical guidance for astrophysicists when adjusting for disagreements among instruments. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Hill R, Shariff H, Trotta R,
et al., 2018, Projected distances to host galaxy reduce SNIa dispersion, *Monthly Notices of the Royal Astronomical Society*, Vol: 481, Pages: 2766-2777, ISSN: 0035-8711

We use multi-band imagery data from the Sloan Digital Sky Survey (SDSS) to measure projected distances of 302 supernova type Ia (SNIa) from the centre of their host galaxies, normalized to the galaxy's brightness scale length, with a Bayesian approach. We test the hypothesis that SNIas further away from the centre of their host galaxy are less subject to dust contamination (as the dust column density in their environment is maller) and/or come from a more homogeneous environment. Using the Mann-Whitney U test, we find a statistically significant difference in the observed colour correction distribution between SNIas that are near and those that are far from the centre of their host. The local p-value is 3 x 10^{-3}, which is significant at the 5 per cent level after look-elsewhere effect correction. We estimate the residual scatter of thetwo subgroups to be 0.073 +/- 0.018 for the far SNIas, compared to 0.114 +/-0.009 for the near SNIas -- an improvement of 30 per cent, albeit with a low statistical significance of 2sigma. This confirms the importance of host galaxy properties in correctly interpreting SNIa observations for cosmological inference.

Yu X, Del Zanna G, Stenning D,
et al., 2018, Incorporating uncertainties in atomic data Into the analysis of solar and stellar observations: a case study in Fe XIII, *The Astrophysical Journal: an international review of astronomy and astronomical physics*, Vol: 866, ISSN: 0004-637X

Information about the physical properties of astrophysical objects cannot be measured directly but is inferred by interpreting spectroscopic observations in the context of atomic physics calculations. Ratios of emission lines, for example, can be used to infer the electron density of the emitting plasma. Similarly, the relative intensities of emission lines formed over a wide range of temperatures yield information on the temperature structure. A critical component of this analysis is understanding how uncertainties in the underlying atomic physics propagate to the uncertainties in the inferred plasma parameters. At present, however, atomic physics databases do not include uncertainties on the atomic parameters and there is no established methodology for using them even if they did. In this paper we develop simple models for uncertainties in the collision strengths and decay rates for Fe xiii and apply them to the interpretation of density-sensitive lines observed with the EUV (extreme ultraviolet) Imagining spectrometer (EIS) on Hinode. We incorporate these uncertainties in a Bayesian framework. We consider both a pragmatic Bayesian method where the atomic physics information is unaffected by the observed data, and a fully Bayesian method where the data can be used to probe the physics. The former generally increases the uncertainty in the inferred density by about a factor of 5 compared with models that incorporate only statistical uncertainties. The latter reduces the uncertainties on the inferred densities, but identifies areas of possible systematic problems with either the atomic physics or the observed intensities.

Si S, van Dyk DA, von Hippel T,
et al., 2018, Bayesian hierarchical modelling of initial-final mass relations across star clusters, *Monthly Notices of the Royal Astronomical Society*, Vol: 480, Pages: 1300-1321, ISSN: 0035-8711

The initial–final mass relation (IFMR) of white dwarfs (WDs) plays an important role in stellar evolution. To derive precise estimates of IFMRs and explore how they may vary among star clusters, we propose a Bayesian hierarchical model that pools photometric data from multiple star clusters. After performing a simulation study to show the benefits of the Bayesian hierarchical model, we apply this model to five star clusters: the Hyades, M67, NGC 188, NGC 2168, and NGC 2477, leading to reasonable and consistent estimates of IFMRs for these clusters. We illustrate how a cluster-specific analysis of NGC 188 using its own photometric data can produce an unreasonable IFMR since its WDs have a narrow range of zero-age main sequence (ZAMS) masses. However, the Bayesian hierarchical model corrects the cluster-specific analysis by borrowing strength from other clusters, thus generating more reliable estimates of IFMR parameters. The data analysis presents the benefits of Bayesian hierarchical modelling over conventional cluster-specific methods, which motivates us to elaborate the powerful statistical techniques in this paper.

Chen Y, Meng X-L, Wang X, et al., 2018, Calibration concordance for astronomical instruments via multiplicative shrinkage, Publisher: arXiv

Calibration data are often obtained by observing several well-understoodobjects simultaneously with multiple instruments, such as satellites formeasuring astronomical sources. Analyzing such data and obtaining properconcordance among the instruments is challenging when the physical sourcemodels are not well understood, when there are uncertainties in "known"physical quantities, or when data quality varies in ways that cannot be fullyquantified. Furthermore, the number of model parameters increases with both thenumber of instruments and the number of sources. Thus, concordance of theinstruments requires careful modeling of the mean signals, the intrinsic sourcedifferences, and measurement errors. In this paper, we propose a log-Normalhierarchical model and a more general log-t model that respect themultiplicative nature of the mean signals via a half-variance adjustment, yetpermit imperfections in the mean modeling to be absorbed by residual variances.We present analytical solutions in the form of power shrinkage in special casesand develop reliable MCMC algorithms for general cases. We apply our method toseveral data sets obtained with a variety of X-ray telescopes such as Chandra.We demonstrate that our method provides helpful and practical guidance forastrophysicists when adjusting for disagreements among instruments.

Tak H, Meng X-L, van Dyk DA, 2018, A repelling–attracting metropolis algorithm for multimodality, *Journal of Computational and Graphical Statistics*, Vol: 27, Pages: 479-490, ISSN: 1061-8600

Although the Metropolis algorithm is simple to implement, it often has difficulties exploring multimodal distributions. We propose the repelling–attracting Metropolis (RAM) algorithm that maintains the simple-to-implement nature of the Metropolis algorithm, but is more likely to jump between modes. The RAM algorithm is a Metropolis-Hastings algorithm with a proposal that consists of a downhill move in density that aims to make local modes repelling, followed by an uphill move in density that aims to make local modes attracting. The downhill move is achieved via a reciprocal Metropolis ratio so that the algorithm prefers downward movement. The uphill move does the opposite using the standard Metropolis ratio which prefers upward movement. This down-up movement in density increases the probability of a proposed move to a different mode. Because the acceptance probability of the proposal involves a ratio of intractable integrals, we introduce an auxiliary variable which creates a term in the acceptance probability that cancels with the intractable ratio. Using several examples, we demonstrate the potential for the RAM algorithm to explore a multimodal distribution more efficiently than a Metropolis algorithm and with less tuning than is commonly required by tempering-based methods. Supplementary materials are available online.

Revsbech EA, Trotta R, van Dyk DA, 2017, STACCATO: a novel solution to supernova photometric classification with biased training sets, *Monthly Notices of the Royal Astronomical Society*, Vol: 473, ISSN: 0035-8711

We present a new solution to the problem of classifying Type Ia supernovae from their light curves alone given a spectroscopically confirmed but biased training set, circumventing the need to obtain an observationally expensive unbiased training set. We use Gaussian processes (GPs) to model the supernovae's (SN's) light curves, and demonstrate that the choice of covariance function has only a small influence on the GPs ability to accurately classify SNe. We extend and improve the approach of Richards et al. – a diffusion map combined with a random forest classifier – to deal specifically with the case of biased training sets. We propose a novel method called Synthetically Augmented Light Curve Classification (STACCATO) that synthetically augments a biased training set by generating additional training data from the fitted GPs. Key to the success of the method is the partitioning of the observations into subgroups based on their propensity score of being included in the training set. Using simulated light curve data, we show that STACCATO increases performance, as measured by the area under the Receiver Operating Characteristic curve (AUC), from 0.93 to 0.96, close to the AUC of 0.977 obtained using the ‘gold standard’ of an unbiased training set and significantly improving on the previous best result of 0.88. STACCATO also increases the true positive rate for SNIa classification by up to a factor of 50 for high-redshift/low-brightness SNe.

Tak H, Mandel K, Van Dyk DA,
et al., 2017, Bayesian estimates of astronomical time delays between gravitationally lensed stochastic light curves, *Annals of Applied Statistics*, Vol: 11, Pages: 1309-1348, ISSN: 1941-7330

The gravitational field of a galaxy can act as a lens and deflectthe light emitted by a more distant object such as a quasar. Stronggravitational lensing causes multiple images of the same quasar to ap-pear in the sky. Since the light in each gravitationally lensed imagetraverses a different path length from the quasar to the Earth, fluc-tuations in the source brightness are observed in the several imagesat different times. The time delay between these fluctuations canbe used to constrain cosmological parameters and can be inferredfrom the time series of brightness data or light curves of each image.To estimate the time delay, we construct a model based on a state-space representation for irregularly observed time series generatedby a latent continuous-time Ornstein-Uhlenbeck process. We accountfor microlensing, an additional source of independent long-term ex-trinsic variability, via a polynomial regression. Our Bayesian strategyadopts a Metropolis-Hastings within Gibbs sampler. We improve thesampler by using an ancillarity-sufficiency interweaving strategy andadaptive Markov chain Monte Carlo. We introduce a profile likeli-hood of the time delay as an approximation of its marginal posteriordistribution. The Bayesian and profile likelihood approaches comple-ment each other, producing almost identical results; the Bayesianmethod is more principled but the profile likelihood is simpler toimplement. We demonstrate our estimation strategy using simulateddata of doubly- and quadruply-lensed quasars, and observed datafrom quasarsQ0957+561andJ1029+2623.

Si S, Van Dyk DA, von Hippel T,
et al., 2017, A hierarchical model for the ages of Galactic halo white dwarfs, *Monthly Notices of the Royal Astronomical Society*, Vol: 468, Pages: 4374-4388, ISSN: 1365-2966

In astrophysics, we often aim to estimate one or more parameters for each member object in a population and study the distribution of the fitted parameters across the population. In this paper, we develop novel methods that allow us to take advantage of existing software designed for such case-by-case analyses to simultaneously fit parameters of both the individual objects and the parameters that quantify their distribution across the population. Our methods are based on Bayesian hierarchical modelling that is known to produce parameter estimators for the individual objects that are on average closer to their true values than estimators based on case-by-case analyses. We verify this in the context of estimating ages of Galactic halo white dwarfs (WDs) via a series of simulation studies. Finally, we deploy our new techniques on optical and near-infrared photometry of 10 candidate halo WDs to obtain estimates of their ages along with an estimate of the mean age of Galactic halo WDs of12.11+0.85−0.86Gyr. Although this sample is small, our technique lays the ground work for large-scale studies using data from the Gaia mission.

Wagner-Kaiser R, Sarajedini A, von Hippel T,
et al., 2017, The ACS Survey of Galactic Globular Clusters XIV: Bayesian Single-Population Analysis of 69 Globular Clusters, *Monthly Notices of the Royal Astronomical Society*, Vol: 468, Pages: 1038-1055, ISSN: 0035-8711

We use Hubble Space Telescope (HST) imaging from the ACS Treasury Survey to determine fits for single population isochrones of 69 Galactic globular clusters. Using robust Bayesian analysis techniques, we simultaneously determine ages, distances, absorptions and helium values for each cluster under the scenario of a ‘single’ stellar population on model grids with solar ratio heavy element abundances. The set of cluster parameters is determined in a consistent and reproducible manner for all clusters using the Bayesian analysis suite BASE-9. Our results are used to re-visit the age–metallicity relation. We find correlations with helium and several other parameters such as metallicity, binary fraction and proxies for cluster mass. The helium abundances of the clusters are also considered in the context of carbon, nitrogen, and oxygen abundances and the multiple population scenario.

Algeri S, van Dyk DA, Conrad J,
et al., 2016, On methods for correcting for the look-elsewhere effect in searches for new physics, *Journal of Instrumentation*, Vol: 11, Pages: P12010-P12010, ISSN: 1748-0221

The search for new significant peaks over a energy spectrum often involves a statistical multiple hypothesis testing problem. Separate tests of hypothesis are conducted at different locations over a fine grid producing an ensemble of local p-values, the smallest of which is reported as evidence for the new resonance. Unfortunately, controlling the false detection rate (type I error rate) of such procedures may lead to excessively stringent acceptance criteria. In the recent physics literature, two promising statistical tools have been proposed to overcome these limitations. In 2005, a method to ``find needles in haystacks'' was introduced by Pilla et al. [1], and a second method was later proposed by Gross and Vitells [2] in the context of the ``look-elsewhere effect'' and trial factors. We show that, although the two methods exhibit similar performance for large sample sizes, for relatively small sample sizes, the method of Pilla et al. leads to an artificial inflation of statistical power that stems from an increase in the false detection rate. This method, on the other hand, becomes particularly useful in multidimensional searches, where the Monte Carlo simulations required by Gross and Vitells are often unfeasible. We apply the methods to realistic simulations of the Fermi Large Area Telescope data, in particular the search for dark matter annihilation lines. Further, we discuss the counter-intuitive scenario where the look-elsewhere corrections are more conservative than much more computationally efficient corrections for multiple hypothesis testing. Finally, we provide general guidelines for navigating the tradeoffs between statistical and computational efficiency when selecting a statistical procedure for signal detection.

McKeough K, Siemiginowska A, Cheung CC,
et al., 2016, Detecting relativistic X-ray jets in high-redshift quasars, *Astrophysical Journal*, Vol: 833, ISSN: 0004-637X

We analyze Chandra X-ray images of a sample of 11 quasars that are known tocontain kiloparsec scale radio jets. The sample consists of five high-redshift (z ≥3.6) flat-spectrum radio quasars, and six intermediate redshift (2.1 < z < 2.9)quasars. The dataset includes four sources with integrated steep radio spectraand three with flat radio spectra. A total of 25 radio jet features are present inthis sample. We apply a Bayesian multi-scale image reconstruction method todetect and measure the X-ray emission from the jets. We compute deviationsfrom a baseline model that does not include the jet, and compare observed X-ray images with those computed with simulated images where no jet features exist.This allows us to compute p-value upper bounds on the significance that an Xrayjet is detected in a pre-determined region of interest. We detected 12 of thefeatures unambiguously, and an additional 6 marginally. We also find residualemission in the cores of 3 quasars and in the background of 1 quasar that suggestthe existence of unresolved X-ray jets. The dependence of the X-ray to radioluminosity ratio on redshift is a potential diagnostic of the emission mechanism,since the inverse Compton scattering of cosmic microwave background photons(IC/CMB) is thought to be redshift dependent, whereas in synchrotron modelsno clear redshift dependence is expected. We find that the high-redshift jetshave X-ray to radio flux ratios that are marginally inconsistent with those fromlower redshifts, suggesting that either the X-ray emissions is due to the IC/CMBrather than the synchrotron process, or that high redshift jets are qualitativelydifferent.

Si S, van Dyk DAVID, von Hippel T, Sensitivity Analysis of Hierarchical Models for the Ages of Galactic Halo White Dwarfs, 20th European White Dwarf Workshop, Proceedings of a conference

Shariff H, Dhawan S, Jiao X,
et al., 2016, Standardizing type Ia supernovae optical brightness using near infrared rebrightening time, *Monthly Notices of the Royal Astronomical Society*, Vol: 463, Pages: 4311-4316, ISSN: 1365-2966

Accurate standardization of Type Ia supernovae (SNIa) is instrumental to theusage of SNIa as distance indicators. We analyse a homogeneous sample of 22 lowzSNIa, observed by the Carnegie Supernova Project (CSP) in the optical and nearinfra-red (NIR). We study the time of the second peak in the J-band, t2, as an alternativestandardization parameter of SNIa peak optical brightness, as measured by thestandard SALT2 parameter mB. We use BAHAMAS, a Bayesian hierarchical modelfor SNIa cosmology, to estimate the residual scatter in the Hubble diagram.We find that in the absence of a colour correction, t2 is a better standardizationparameter compared to stretch: t2 has a 1σ posterior interval for the Hubble residualscatter of σ∆µ = {0.250, 0.257} mag, compared to σ∆µ = {0.280, 0.287} mag whenstretch (x1) alone is used. We demonstrate that when employed together with a colourcorrection, t2 and stretch lead to similar residual scatter. Using colour, stretch andt2 jointly as standardization parameters does not result in any further reduction inscatter, suggesting that t2 carries redundant information with respect to stretch andcolour. With a much larger SNIa NIR sample at higher redshift in the future, t2 couldbe a useful quantity to perform robustness checks of the standardization procedure.

Wagner-Kaiser R, Stenning D, Sarajedini A,
et al., 2016, Bayesian analysis of two stellar populations in Galactic globular clusters III: Analysis of 30 clusters, *Monthly Notices of the Royal Astronomical Society*, Vol: 463, Pages: 3768-3782, ISSN: 1365-2966

We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACSTreasury observations of 30 Galactic Globular Clusters to characterize two distinctstellar populations. A sophisticated Bayesian technique is employed to simultaneouslysample the joint posterior distribution of age, distance, and extinction for each cluster,as well as unique helium values for two populations within each cluster and therelative proportion of those populations. We find the helium differences among thetwo populations in the clusters fall in the range of ∼0.04 to 0.11. Because adequatemodels varying in CNO are not presently available, we view these spreads as upperlimits and present them with statistical rather than observational uncertainties. Evidencesupports previous studies suggesting an increase in helium content concurrentwith increasing mass of the cluster and also find that the proportion of the first populationof stars increases with mass as well. Our results are examined in the context ofproposed globular cluster formation scenarios. Additionally, we leverage our Bayesiantechnique to shed light on inconsistencies between the theoretical models and theobserved data.

Jeffery EJ, von Hippel T, Van Dyk DA,
et al., 2016, A BAYESIAN ANALYSIS OF THE AGES OF FOUR OPEN CLUSTERS, *Astrophysical Journal*, Vol: 828, ISSN: 1538-4357

In this paper we apply a Bayesian technique to determine the best fit of stellar evolution modelsto find the main sequence turn off age and other cluster parameters of four intermediate-age openclusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chainMonte Carlo technique to fit these various parameters, objectively finding the best fit isochrone foreach cluster. The result is a high precision isochrone fit. We compare these results with the those oftraditional “by eye” isochrone fitting methods. By applying this Bayesian technique to NGC 2360,NGC 2477, NGC 2660, and NGC 3960 we determine the ages of these clusters to be 1.35 ± 0.05,1.02 ± 0.02, 1.64 ± 0.04, and 0.860 ± 0.04 Gyr, respectively. The results of this paper continue oureffort to determine cluster ages to higher precision than that offered by these traditional methods ofisochrone fitting.

Shariff H, Jiao X, Trotta R,
et al., 2016, Bahamas: new analysis of type Ia supernovae reveals inconsistencies with standard cosmology, *The Astrophysical Journal*, Vol: 827, ISSN: 1538-4357

We present results obtained by applying our BAyesian HierArchical Modeling for the Analysis of Supernova cosmology (BAHAMAS) software package to the 740 spectroscopically confirmed supernovae of type Ia (SNe Ia) from the "Joint Light-curve Analysis" (JLA) data set. We simultaneously determine cosmological parameters and standardization parameters, including corrections for host galaxy mass, residual scatter, and object-by-object intrinsic magnitudes. Combining JLA and Planck data on the cosmic microwave background, we find significant discrepancies in cosmological parameter constraints with respect to the standard analysis: we find ${{\rm{\Omega }}}_{{\rm{m}}}=0.399\pm 0.027$, $2.8\sigma $ higher than previously reported, and $w=-0.910\pm 0.045$, $1.6\sigma $ higher than the standard analysis. We determine the residual scatter to be ${\sigma }_{{\rm{res}}}=0.104\pm 0.005$. We confirm (at the 95% probability level) the existence of two subpopulations segregated by host galaxy mass, separated at ${\mathrm{log}}_{10}(M/{M}_{\odot })=10$, differing in mean intrinsic magnitude by 0.055 ± 0.022 mag, lower than previously reported. Cosmological parameter constraints, however, are unaffected by the inclusion of corrections for host galaxy mass. We find $\sim 4\sigma $ evidence for a sharp drop in the value of the color correction parameter, $\beta (z)$, at a redshift ${z}_{t}=0.662\pm 0.055$. We rule out some possible explanations for this behavior, which remains unexplained.

Wong RKW, Kashyap VL, Lee TCM,
et al., 2016, Detecting abrupt changes in the spectra of high-energy astrophysical sources, *Annals of Applied Statistics*, Vol: 10, Pages: 1107-1134, ISSN: 1941-7330

Variable-intensity astronomical sources are the result of complex and often extreme physical processes. Abrupt changes in source intensity are typically accompanied by equally sudden spectral shifts, that is, sudden changes in the wavelength distribution of the emission. This article develops a method for modeling photon counts collected from observation of such sources. We embed change points into a marked Poisson process, where photon wavelengths are regarded as marks and both the Poisson intensity parameter and the distribution of the marks are allowed to change. To the best of our knowledge, this is the first effort to embed change points into a marked Poisson process. Between the change points, the spectrum is modeled nonparametrically using a mixture of a smooth radial basis expansion and a number of local deviations from the smooth term representing spectral emission lines. Because the model is over-parameterized, we employ an ℓ1ℓ1 penalty. The tuning parameter in the penalty and the number of change points are determined via the minimum description length principle. Our method is validated via a series of simulation studies and its practical utility is illustrated in the analysis of the ultra-fast rotating yellow giant star known as FK Com.

Stenning D, Wagner-Kaiser R, Robinson E,
et al., 2016, BAYESIAN ANALYSIS OF TWO STELLAR POPULATIONS IN GALACTIC GLOBULAR CLUSTERS I: STATISTICAL AND COMPUTATIONAL METHODS, *Astrophysical Journal*, Vol: 826, ISSN: 1538-4357

We develop a Bayesian model for globular clusters composed of multiple stellar populations, extendingearlier statistical models for open clusters composed of simple (single) stellar populations (e.g., vanDyk et al. 2009; Stein et al. 2013). Specifically, we model globular clusters with two populations thatdiffer in helium abundance. Our model assumes a hierarchical structuring of the parameters in whichphysical properties—age, metallicity, helium abundance, distance, absorption, and initial mass—arecommon to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to(iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for modelfitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Ourmodel and computational tools are incorporated into an open-source software suite known as BASE-9.We use numerical studies to demonstrate that our method can recover parameters of two-populationclusters, and also show model misspecification can potentially be identified. As a proof of concept,we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods.(BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).

Wagner-Kaiser R, Stenning D, Robinson E,
et al., 2016, Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. II. NGC 5024, NGC 5272, and NGC 6352, *Astrophysical Journal*, Vol: 826, ISSN: 1538-4357

We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ~0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.

Algeri S, Conrad J, van Dyk DA, 2016, A method for comparing non-nested models with application to astrophysical searches for new physics, *Monthly Notices of the Royal Astronomical Society: Letters*, Vol: 458, Pages: L84-L88, ISSN: 1745-3933

Searches for unknown physics and decisions between competing astrophysical models to explain data both rely on statistical hypothesis testing. The usual approach in searches for new physical phenomena is based on the statistical likelihood ratio test and its asymptotic properties. In the common situation, when neither of the two models under comparison is a special case of the other i.e. when the hypotheses are non-nested, this test is not applicable. In astrophysics, this problem occurs when two models that reside in different parameter spaces are to be compared. An important example is the recently reported excess emission in astrophysical γ-rays and the question whether its origin is known astrophysics or dark matter. We develop and study a new, simple, generally applicable, frequentist method and validate its statistical properties using a suite of simulations studies. We exemplify it on realistic simulated data of the Fermi-Large Area Telescope γ-ray satellite, where non-nested hypotheses testing appears in the search for particle dark matter.

Stein NM, Van Dyk DA, Kashyap VL, 2016, Preprocessing solar images while preserving their latent structure, *Statistics and its Interface*, Vol: 9, Pages: 535-551, ISSN: 1938-7997

Telescopes such as the Atmospheric Imaging Assembly aboard the Solar DynamicsObservatory, a NASA satellite, collect massive streams of high resolution imagesof the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermalproperties based on these images can be framed as an ill-posed inverse problem withPoisson noise, but this reconstruction is computationally expensive and there is disagreementamong researchers about what regularization or prior assumptions are mostappropriate. This article presents an image segmentation framework for preprocessingsuch images in order to reduce the data volume while preserving as much thermal informationas possible for later downstream analyses. The resulting segmented imagesreflect thermal properties but do not depend on solving the ill-posed inverse problem.This allows users to avoid the Poisson inverse problem altogether or to tackle it on eachof ∼10 segments rather than on each of ∼107 pixels, reducing computing time by afactor of ∼106. We employ a parametric class of dissimilarities that can be expressed ascosine dissimilarity functions or Hellinger distances between nonlinearly transformedvectors of multi-passband observations in each pixel. We develop a decision theoreticframework for choosing the dissimilarity that minimizes the expected loss that ariseswhen estimating identifiable thermal properties based on segmented images rather thanon a pixel-by-pixel basis. We also examine the efficacy of different dissimilarities forrecovering clusters in the underlying thermal properties. The expected losses are computedunder scientifically motivated prior distributions. Two simulation studies guideour choices of dissimilarity function. We illustrate our method by segmenting imagesof a coronal hole observed on 26 February 2015.

Stein NM, van Dyk DA, Kashyap VL,
et al., 2015, Detecting Unspecified Structure in Low-Count Images, *Astrophysical Journal*, Vol: 813, ISSN: 1538-4357

Unexpected structure in images of astronomical sources often presents itself uponvisual inspection of the image, but such apparent structure may either correspond totrue features in the source or be due to noise in the data. This paper presents amethod for testing whether inferred structure in an image with Poisson noise represents asignificant departure from a baseline (null) model of the image. To infer image structure,we conduct a Bayesian analysis of a full model that uses a multiscale component toallow flexible departures from the posited null model. As a test statistic, we use atail probability of the posterior distribution under the full model. This choice of teststatistic allows us to estimate a computationally efficient upper bound on a p-valuethat enables us to draw strong conclusions even when there are limited computationalresources that can be devoted to simulations under the null model. We demonstratethe statistical performance of our method on simulated images. Applying our methodto an X-ray image of the quasar 0730+257, we find significant evidence against the nullmodel of a single point source and uniform background, lending support to the claim ofan X-ray jet.

Jones DE, Kashyap VL, van Dyk DA, 2015, DISENTANGLING OVERLAPPING ASTRONOMICAL SOURCES USING SPATIAL AND SPECTRAL INFORMATION, *Astrophysical Journal*, Vol: 808, ISSN: 1538-4357

We present a powerful new algorithm that combines both spatial information (event locations and the point-spread function) and spectral information (photon energies) to separate photons from overlapping sources. We use Bayesian statistical methods to simultaneously infer the number of overlapping sources, to probabilistically separate the photons among the sources, and to fit the parameters describing the individual sources. Using the Bayesian joint posterior distribution, we are able to coherently quantify the uncertainties associated with all these parameters. The advantages of combining spatial and spectral information are demonstrated through a simulation study. The utility of the approach is then illustrated by analysis of observations of FK Aqr and FL Aqr with the XMM-Newton Observatory and the central region of the Orion Nebula Cluster with the Chandra X-ray Observatory.

van Dyk DA, Jiao X, 2015, Metropolis-Hastings Within Partially Collapsed Gibbs Samplers, *Journal of Computational and Graphical Statistics*, Vol: 24, Pages: 301-327, ISSN: 1537-2715

The partially collapsed Gibbs (PCG) sampler offers a new strategy for improving the convergence of a Gibbs sampler. PCG achieves faster convergence by reducing the conditioning in some of the draws of its parent Gibbs sampler. Although this can significantly improve convergence, care must be taken to ensure that the stationary distribution is preserved. The conditional distributions sampled in a PCG sampler may be incompatible and permuting their order may upset the stationary distribution of the chain. Extra care must be taken when Metropolis-Hastings (MH) updates are used in some or all of the updates. Reducing the conditioning in an MH within Gibbs sampler can change the stationary distribution, even when the PCG sampler would work perfectly if MH were not used. In fact, a number of samplers of this sort that have been advocated in the literature do not actually have the target stationary distributions. In this article, we illustrate the challenges that may arise when using MH within a PCG sampler and develop a general strategy for using such updates while maintaining the desired stationary distribution. Theoretical arguments provide guidance when choosing between different MH within PCG sampling schemes. Finally, we illustrate the MH within PCG sampler and its computational advantage using several examples from our applied work.

Liao K, Treu T, Marshall P,
et al., 2015, STRONG LENS TIME DELAY CHALLENGE. II. RESULTS OF TDC1, *ASTROPHYSICAL JOURNAL*, Vol: 800, ISSN: 0004-637X

- Author Web Link
- Cite
- Citations: 62

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.