DrRogerNewson

Faculty of Medicine, School of Public Health

Honorary Research Associate

Contact

+44 (0)20 7594 2784r.newson Website

Assistant

Ms Dorothea Cockerell +44 (0)20 7594 3368

Location

351Reynolds BuildingCharing Cross Campus

Summary

Publications

Farrell A, Alaghband-Zadeh J, Carter G, Newson RB, Cream JJet al., 1999, Do some men with acne vulgaris have raised levels of LH?, CLINICAL ENDOCRINOLOGY, Vol: 50, Pages: 393-397, ISSN: 0300-0664

Author Web Link
Cite
Citations: 1

Journal article

Tachakra S, Ho S, Lynch M, Newson Ret al., 1998, Should doctors practise resuscitation skills on newly deceased patients? A survey of public opinion, JOURNAL OF THE ROYAL SOCIETY OF MEDICINE, Vol: 91, Pages: 576-578, ISSN: 0141-0768

Author Web Link
Cite
Citations: 8

Journal article

Mallon E, Young D, Bunce M, Gotch FM, Easterbrook PT, Newson R, Bunker CBet al., 1998, HLA-Cw*0602 and HIV-associated psoriasis, BRITISH JOURNAL OF DERMATOLOGY, Vol: 139, Pages: 527-533, ISSN: 0007-0963

Author Web Link
Cite
Citations: 31

Journal article

Gibbs RGJ, Todd JC, Irvine C, Lawrenson R, Newson R, Greenhalgh RM, Davies AHet al., 1998, Relationship between the regional and national incidence of transient ischaemic attack and stroke and performance of carotid endarterectomy, EUROPEAN JOURNAL OF VASCULAR AND ENDOVASCULAR SURGERY, Vol: 16, Pages: 47-52, ISSN: 1078-5884

Author Web Link
Cite
Citations: 26

Journal article

Newson R, Strachan D, Archibald E, Emberlin J, Hardaker P, Collier Cet al., 1998, Acute asthma epidemics, weather and pollen in England, 1987-1994, EUROPEAN RESPIRATORY JOURNAL, Vol: 11, Pages: 694-701, ISSN: 0903-1936

Author Web Link
Cite
Citations: 85

Journal article

Farmer RDT, Newson RB, MacRae K, Lawrenson RA, Tyrer Fet al., 1997, Mortality from venous thromboembolism among young women in Europe: no evidence for any effect of third generation oral contraceptives, JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, Vol: 51, Pages: 630-635, ISSN: 0143-005X

Author Web Link
Cite
Citations: 8

Journal article

Newson R, Strachan D, Archibald E, Emberlin J, Hardaker P, Collier Cet al., 1997, Effect of thunderstorms and airborne grass pollen on the incidence of acute asthma in England, 1990-94, THORAX, Vol: 52, Pages: 680-685, ISSN: 0040-6376

Author Web Link
Cite
Citations: 86

Journal article

Newson R, From datasets to resultssets in Stata

A resultsset is a Stata dataset created as output by a Stata program.It can be used as input to other Stata programs, which may in turnoutput the results as publication-ready plots or tables. Programs thatcreate resultssets include xcontract, xcollapse, parmest, parmby and descsave. Stata resultssets do asimilar job to SAS output data sets, which are saved to disk files.However, in Stata, the user typically has the options of saving aresultsset to a disk file, writing it to the memory (overwriting anypre-existing data set), or simply listing it. Resultssets are oftensaved to temporary files, using the tempfile command. Thislecture introduces programs that create resultssets, and also programsthat do things with resultssets after they have been created. listtex outputs resultssets to tables that can be inserted into aMicrosoft Word, HTML or LaTeX document. eclplot inputs resultssetsand creates confidence interval plots. Other programs, such as sencode and tostring, process resultssets after they are createdand before they are listed, tabulated or plotted. These programs, usedtogether, have a power not always appreciated if the user simply readsthe on-line help for each package.

Abstract
Cite

Scholarly edition

Newson R, PARMEST: Stata module to create new data set with one observation per parameter of most recent model

The parmest package has 4 modules: parmest, parmby, parmcip and metaparm. parmest creates an output dataset, with 1 observation per parameter of the most recent estimation results, and variables corresponding to parameter names, estimates, standard errors, z- or t-test statistics, P-values, confidence limits and other parameter attributes. parmby is a quasi-byable extension to parmest, which calls an estimation command, and creates a new dataset, with 1 observation per parameter if the by() option is unspecified, or 1 observation per parameter per by-group if the by() option is specified. parmcip inputs variables containing estimates, standard errors and (optionally) degrees of freedom, and computes new variables containing confidence intervals and P-values. metaparm inputs a parmest-type dataset with 1 observation for each of a set of independently-estimated parameters, and outputs a dataset with 1 observation for each of a set of linear combinations of these parameters, with confidence intervals and P-values, as for a meta-analysis. The output datasets created by parmest, parmby or metaparm may be listed to the Stata log and/or saved to a file and/or retained in memory (overwriting any pre-existing dataset). The confidence intervals, P-values and other parameter attributes in the dataset may be listed and/or plotted and/or tabulated.

Abstract
Cite

Software

Newson R, CCWEIGHT: Stata module to generate inverse sampling probability weights

ccweight takes, as input, a varlist whose distinct values correspond to case groups, and a status variable (1 for cases, 0 for controls) in the option status. It creates, as output, a new variable, suitable for use as a pweight variable when the case-control study is analysed by regression with robust variances.

Abstract
Cite

Software

Newson RB, Sensible parameters for polynomials and other splines

Splines, including polynomials, are traditionally used to model nonlinear relationships involving continuous predictors. However, when they are included in linear models (or generalized linear models), the estimated parameters for polynomials are not easy for nonmathematicians to understand, and the estimated parameters for other splines are often not easy even for mathematicians to understand. It would be easier if the parameters were differences or ratios between the values of the spline at the reference points and the value of the spline at a base reference point or if the parameters were values of the polynomial or spline at reference points on the x-axis, or The bspline package can be downloaded from Statistical Software Components, and generates spline bases for inclusion in the design matrices of linear models, based on Schoenberg B-splines. The package now has a recently added module flexcurv, which inputs a sequence of reference points on the x-axis and outputs a spline basis, based on equally spaced knots generated automatically, whose parameters are the values of the spline at the reference points. This spline basis can be modified by excluding the spline vector at a base reference point and including the unit vector. If this is done, then the parameter corresponding to the unit vector will be the value of the spline at the base reference point, and the parameters corresponding to the remaining reference spline vectors will be differences between the values of the spline at the corresponding reference points and the value of the spline at the base reference point. The spline bases are therefore extensions, to continuous factors, of the bases of unit vectors and/or indicator functions used to model discrete factors. It is possible to combine these bases for different continuous and/or discrete factors in the same way, using product bases in a design matrix to estimate factor-value combination means and/or factor-value effects and/or factor interactions.

Scholarly edition

Newson RB, Post-parmest peripherals: fvregen, invcise, and qqvalue

The parmest package is used with Stata estimation commands to produce output datasets (or results-sets) with one observation per estimated parameter, and data on parameter names, estimates, confidence limits, p-values, and other parameter attributes. These results-sets can then be input to other Stata programs to produce tables, listings, plots, and secondary results-sets containing derived parameters. Three recently added packages for post-parmest processing are fvregen, invcise, and qqvalue. fvregen is used when the parameters belong to models containing factor variables, introduced in Stata version 11. It regenerates these factor variables in the results-set, enabling the user to plot, list, or tabulate factor levels with estimates and confidence limits of parameters specific to these factor levels. invcise calculates standard errors inversely from confidence limits produced without standard errors, such as those for medians and for Hodges-Lehmann median differences. These standard errors can then be input, with the estimates, into the metaparm module of parmest to produce confidence intervals for linear combinations of medians or of median differences, such as those used in meta-analysis or interaction estimation. qqvalue inputs the p-values in a results-set and creates a new variable containing the quasi-q-values, which are calculated by inverting a multiple-test procedure designed to control the familywise error rate (FWER) or the false discovery rate (FDR). The quasi-q-value for each p-value is the minimum FWER or FDR for which that p-value would be in the discovery set if the specified multiple-test procedure was used on the full set of p-values. fvregen, invcise, qqvalue, and parmest can be downloaded from SSC.

Scholarly edition

Newson RB, Homoskedastic adjustment inflation factors in model selection

Insufficient confounder adjustment is viewed as a common source of "false discoveries",especially in the epidemiology sector. However, adjustment for "confounders" that are correlatedwith the exposure, but which do not independently predict the outcome, may cause loss of powerto detect the exposure effect. On the other hand, choosing confounders based on "stepwise"methods is subject to many hazards, which imply that the confidence interval eventuallypublished is likely not to have the advertized coverage probability for the effect that wewanted to know. We would like to be able to find a model in the data on exposures andconfounders, and then to estimate the parameters of that model from the conditional distributionof the outcome, given the exposures and confounders. The haif package, downloadable from SSC,calculates the homoskedastic adjustment inflation factors (HAIFs), by which the variances andstandard errors of coeffcients for a matrix of X-variables are scaled (or inflated), if a matrixof unnecessary confounders A is also included in a regression model, assuming equal variances(homoskedasticity). These can be calculated from the A- and X-variables alone, and can be usedto inform the choice of a set of models eventually fitted to the outcome data, together with theusual criteria involving causality and prior opinion. Examples are given of the use of HAIFs andtheir ratios.

Scholarly edition

Newson R, Creating plots and tables of estimation results using parmest and friends

Statisticians make their living mostly by producing confidence intervals and p-values. However, those supplied in the Stata log are not in any fit state to be delivered to the end user, who usually at least wants them tabulated and formatted, and may appreciate them even more if they are plotted on a graph for immediate impact. The parmest package was developed to make this easy, and consists of two programs. These are parmest, which converts the latest estimation results to a data set with one observation per estimated parameter and data on confidence intervals, p-values and other estimation results, and parmby, a ``quasi-byable'' front end to parmest, which is like statsby, but creates a data set with one observation per parameter per by-group instead of a data set with one observation per by-group. The parmest package can be used together with a team of other Stata programs to produce a wide range of tables and plots of confidence intervals and p-values. The programs descsave and factext can be used with parmby to create plots of confidence intervals against values of a categorical factor included in the fitted model, using dummy variables produced by xi or tabulate. The user may easily fit multiple models, produce a parmby output data set for each one, and concatenate these output data sets using the program dsconcat to produce a combined data set, which can then be used to produce tables or plots involving parameters from all the models. For instance, the user might tabulate or plot unadjusted and adjusted regression parameters side by side, together with their confidence limits and/or p-values. The parmest team is particularly useful when dealing with large volumes of results derived from multiple multi-parameter mode

Abstract
Cite

Scholarly edition

Newson RB, Robust confidence intervals for Hodges–Lehmann median difference

The cendif module is part of the somersd package, and calculates confidence intervals for the Hodges–Lehmann median difference between values of a variable in two subpopulations. The traditional Lehmann formula, unlike the formula used by cendif, assumes that the two subpopulation distributions are different only in location, and that the subpopulations are therefore equally variable. The cendif formula therefore contrasts with the Lehmann formula as the unequal-variance t-test contrasts with the equal-variance t-test. In a simulation study, designed to test cendif to destruction, the performance of cendif was compared to that of the Lehmann formula, using coverage probabilities and median confidence interval width ratios. The simulations involved sampling from pairs of Normal or Cauchy distributions, with subsample sizes ranging from 5 to 40, and between-subpopulation variability scale ratios ranging from 1 to 4. If the sample numbers were equal, then both methods gave coverage probabilities close to the advertized confidence level. However, if the sample numbers were unequal, then the Lehmann coverage probabilities were over-conservative if the smaller sample was from the less variable population, and over-liberal if the smaller sample was from the more variable population. The cendif coverage probability was usually closer to the advertized level, if the smaller sample was not very small. However, if the sample sizes were 5 and 40, and the two populations were equally variable, then the Lehmann coverage probability was close to its advertised level, while the cendif coverage probability was over-liberal. The cendif confidence interval, in its present form, is therefore robust both to non-Normality and to unequal variablity, but may be less robust to the possibility that the smaller sample size is very small. Possibilities for improvement are discussed.

Scholarly edition

Newson R, On the central role of Somers' D

Somers' D and Kendall's tau-a are parameters behind rank or nonparametric statistics, interpreted as differences between proportions. Given two bivariate data pairs (X1, Y1) and (X2, Y2), Kendall’s tau-a parameter tau-XY is the difference between the probability that the two X–Y pairs are concordant and the probability that the two X–Y pairs are discordant, and Somers' D parameter DYX is the difference between the corresponding conditional probabilities, given that the X-values are ordered. The somersd package computes confidence intervals for both parameters. The Stata 9 version of somersd uses Mata to increase computing speed and greatly extends the definition of Somers' D, allowing the X and/or Y variables to be left- or right-censored and allowing multiple versions of Somers' D for multiple sampling schemes for the X–Y pairs. In particular, we may define stratified versions of Somers' D, in which we compare only X–Y pairs from the same stratum. The strata may be defined by grouping a Rubin–Rosenbaum propensity score, based on the values of multiple confounders for an association between exposure variable X and an outcome variable Y . Therefore, rank statistics can have not only confidence intervals but also confounder-adjusted confidence intervals. Usually, we either estimate DYX as a measure of the effect of X on Y , or we estimate DXY as a measure of the performance of X as a predictor of Y, compared with other predictors. Alternative rank-based measures of the effect of X on Y include the Hodges–Lehmann median difference and the Theil–Sen median slope, both of which are defined in terms of Somers' D.

Abstract
Cite

Scholarly edition

Newson R, Generalized confidence interval plots using commands or dialogs

Confidence intervals may be presented as publication-ready tables or as presentation-ready plots. -eclplot- produces plots of estimates and confidence intervals. It inputs a dataset (or resultsset) with one observation per parameter and variables containing estimates, lower and upper confidence limits, and a fourth variable, against which the confidence intervals are plotted. This resultsset can be used for producing both plots and tables, and may be generated using a spreadsheet or using -statsby-, -postfile- or the unofficial Stata -parmest- package. Currently, -eclplot- offers 7 plot types for the estimates and 8 plot types for the confidence intervals, each corresponding to a -graph twoway- subcommand. These plot types can be combined to produce56 combined plot types, some of which are more useful than others, and all of which can be either horizontal or vertical. -eclplot- has a -plot()- option, allowing the user to superimpose other plots to add features such as stars for P-values. -eclplot- can be used either by typing a command, which may have multiple lines andsub-suboptions, or by using a dialog, which generates the command for users not fluent in the Stata graphics language.

Abstract
Cite

Scholarly edition

Newson R, RGLM: Stata module to estimate robust generalized linear models

rglm fits generalized linear models and calculates a Huber (sandwich) estimate of the variance-covariance matrix of estimates. It can be used alone or called without arguments after a previous call to glm. As with other "robust" commands, the units may be considered to fall into clusters. This version was posted on 28 February 1999.

Abstract
Cite

Software

Newson R, Resultssets, resultsspreadsheets, and resultsplots in Stata

Most Stata users make their living producing results in a form accessible to end users. Most of these end users cannot immediately understand Stata logs. However, they can understand tables (in paper, PDF, HTML, spreadsheet, or word processor documents) and plots (produced by using Stata or non-Stata software). Tables are produced by Stata as resultsspreadsheets, and plots are produced by Stata as resultsplots. Sometimes (but not always), resultsspreadsheets, and resultsplots are produced using resultssets. Resultssets, resultsspreadsheets and resultsplots are all produced, directly or indirectly, as output by Stata commands. A resultsset is a Stata dataset, which is a table whose rows are Stata observations and whose columns are Stata variables. A resultsspreadsheet is a table in generic text format, conforming to a TeX or HTML convention, or to another convention with a column separator string and possibly left and right row delimiter strings. A resultsplot is a plot produced as output, using a resultsset or a resultsspreadsheet as input. Resultsset-producing programs include statsby, parmby, parmest, collapse, contract, xcollapse, and xcontract. Resultsspreadsheet-producing programs include outsheet, listtex, estout, and estimates table. Resultsplot-producing programs include eclplot and mileplot. There are two main approaches (or dogmas) for generating resultsspreadsheets and resultsplots. The resultsset-centered dogma is followed by parmest and parmby users and states: “Datasets make resultssets, which make resultsplots and resultsspreadsheets”. The resultsspreadsheet-centered dogma is followed by estout and estimates table users and states: “Datasets make resultsspreadsheets, which make resultssets, which make resultsplots”. The two dogmas are complementary, and each dogma has its advantages and disadvantages. The resultsspreadsheet dogma is much easier for the casual user to learn to apply in a hurry and is therefore probably preferred

Abstract
Cite

Scholarly edition

Newson R, Splines with parameters that can be explained in words to non-mathematicians

This contribution is based on my programs bspline and frencurv, which are used to generate bases for Schoenberg B-splines and splines parameterized by their values at reference points on the X-axis (presented in STB-57 as insert sg151). The program frencurv ("French curve") makes it possible for the user to fit a model containing a spline, whose parameters are simply values of the spline at reference points on the X-axis. For instance, if I am modeling a time series of daily hospital asthma admissions counts to assess the effect of acute pollution episodes, I might use a spline to model the long-term time trend (typically a gradual long-term increase superimposed on a seasonal cycle), and include extra parameters representing the short-term increases following pollution episodes. The parameters of the spline, as presented with confidence intervals, might then be the levels of hospital admissions, on the first day of each month, expected in the absence of pollution. The spline would then be a way of interpolating expected pollution-free values for the other days of the month. The advantage of presenting splines in this way is that the spline parameters can be explained in words to a non-mathematician (e.g., a medic), which is not easy with other parameterizations used for splines.

Abstract
Cite

Scholarly edition

Newson RB, parmest and extensions

The parmest package creates output datasets (or results sets) with one observation for each of a set of estimated parameters, and data on the parameter estimates, standard errors, degrees of freedom, t or z statistics, p-values, confidence limits, and other parameter attributes specified by the user. It is especially useful when parameter estimates are "mass-produced", as in a genome scan. Versions of the package have existed on SSC since 1998, when it contained the single command parmest. However, the package has since been extended with additional commands. The metaparm command allows the user to mass-produce confidence intervals for linear combinations of uncorrelated parameters. Examples include confidence intervals for a weighted arithmetic or geometric mean parameter in a meta-analysis, or for differences or ratios between parameters, or for interactions, defined as differences (or ratios) between differences. The parmcip command is a lower-level utility, inputting variables containing estimates, standard errors, and degrees of freedom, and outputting variables containing confidence limits and p-values. As an example, we can input genotype frequencies and calculate confidence intervals for geometric mean homozygote/heterozygote ratios for genetic polymorphisms, measuring the size and direction of departures from Hardy-Weinberg equilibrium.

Scholarly edition

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00166668&limit=30&person=true&page=5&respub-action=search.html