Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Battey HS, 2019,

    On sparsity scales and covariance matrix transformations

    , Biometrika, ISSN: 0006-3444

    We develop a theory of covariance and concentration matrix estimation on any given or es-timated sparsity scale when the matrix dimension is larger than the sample size. Non-standardsparsity scales are justified when such matrices are nuisance parameters, distinct from interest pa-rameters which should always have a direct subject-matter interpretation. The matrix logarithmicand inverse scales are studied as special cases, with the corollary that a constrained optimization-10based approach is unnecessary for estimating a sparse concentration matrix. It is shown throughsimulations that, for large unstructured covariance matrices, there can be appreciable advantagesto estimating a sparse approximation to the log-transformed covariance matrix and convertingthe conclusions back to the scale of interest.

  • Journal article
    Battey HS, Cox DR, 2018,

    Large numbers of explanatory variables: a probabilistic assessment

    , Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol: 474, ISSN: 1364-5021

    Recently, Cox and Battey (2017 Proc. Natl Acad. Sci. USA 114, 8592–8595 (doi:10.1073/pnas.1703764114)) outlined a procedure for regression analysis when there are a small number of study individuals and a large number of potential explanatory variables, but relatively few of the latter have a real effect. The present paper reports more formal statistical properties. The results are intended primarily to guide the choice of key tuning parameters.

  • Journal article
    Avella M, Battey HS, Fan J, Li Qet al., 2018,

    Robust estimation of high-dimensional covariance and precision matrices

    , Biometrika, Vol: 105, Pages: 271-284

    High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded2+ϵmoments forϵ∈(0,2). The associated convergence rates depend onϵ.

  • Journal article
    Battey HS, Zhu Z, Fan J, Lu J, Liu Het al., 2018,

    Distributed testing and estimation in sparse high dimensional models

    , Annals of Statistics, Vol: 46, Pages: 1352-1382

    This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

  • Journal article
    Cox DR, Battey HS, 2017,

    Large numbers of explanatory variables, a semi-descriptive analysis

    , Proceedings of the National Academy of Sciences of the United States of America, Vol: 114, Pages: 8592-8595

    Data with a relatively small number of study individuals and a very large number of potential explanatory features arise particularly, but by no means only, in genomics. A powerful method of analysis, the lasso [Tibshirani R (1996) J Roy Stat Soc B 58:267–288], takes account of an assumed sparsity of effects, that is, that most of the features are nugatory. Standard criteria for model fitting, such as the method of least squares, are modified by imposing a penalty for each explanatory variable used. There results a single model, leaving open the possibility that other sparse choices of explanatory features fit virtually equally well. The method suggested in this paper aims to specify simple models that are essentially equally effective, leaving detailed interpretation to the specifics of the particular study. The method hinges on the ability to make initially a very large number of separate analyses, allowing each explanatory feature to be assessed in combination with many other such features. Further stages allow the assessment of more complex patterns such as nonlinear and interactive dependences. The method has formal similarities to so-called partially balanced incomplete block designs introduced 80 years ago [Yates F (1936) J Agric Sci 26:424–455] for the study of large-scale plant breeding trials. The emphasis in this paper is strongly on exploratory analysis; the more formal statistical properties obtained under idealized assumptions will be reported separately.

  • Journal article
    Bennedsen M, Lunde A, Pakkanen MS, 2017,

    Hybrid scheme for Brownian semistationary processes

    , Finance and Stochastics, Vol: 21, Pages: 931-965, ISSN: 1432-1122

    We introduce a simulation scheme for Brownian semistationary processes, whichis based on discretizing the stochastic integral representation of the processin the time domain. We assume that the kernel function of the process isregularly varying at zero. The novel feature of the scheme is to approximatethe kernel function by a power function near zero and by a step functionelsewhere. The resulting approximation of the process is a combination ofWiener integrals of the power function and a Riemann sum, which is why we callthis method a hybrid scheme. Our main theoretical result describes theasymptotics of the mean square error of the hybrid scheme and we observe thatthe scheme leads to a substantial improvement of accuracy compared to theordinary forward Riemann-sum scheme, while having the same computationalcomplexity. We exemplify the use of the hybrid scheme by two numericalexperiments, where we examine the finite-sample properties of an estimator ofthe roughness parameter of a Brownian semistationary process and study MonteCarlo option pricing in the rough Bergomi model of Bayer et al. (2015),respectively.

  • Journal article
    Griffie J, Shlomovich L, Williamson D, Shannon M, Aarons J, Khuon S, Burn G, Boelen L, Peters R, Cope A, Cohen E, Rubin-Delanchy P, Owen Det al., 2017,

    3D Bayesian cluster analysis of super-resolution data reveals LAT recruitment to the T cell synapse

    , Scientific Reports, Vol: 7, ISSN: 2045-2322

    Single-molecule localisation microscopy (SMLM) allows the localisation of fluorophores with a precision of 10–30 nm, revealing the cell’s nanoscale architecture at the molecular level. Recently, SMLM has been extended to 3D, providing a unique insight into cellular machinery. Although cluster analysis techniques have been developed for 2D SMLM data sets, few have been applied to 3D. This lack of quantification tools can be explained by the relative novelty of imaging techniques such as interferometric photo-activated localisation microscopy (iPALM). Also, existing methods that could be extended to 3D SMLM are usually subject to user defined analysis parameters, which remains a major drawback. Here, we present a new open source cluster analysis method for 3D SMLM data, free of user definable parameters, relying on a model-based Bayesian approach which takes full account of the individual localisation precisions in all three dimensions. The accuracy and reliability of the method is validated using simulated data sets. This tool is then deployed on novel experimental data as a proof of concept, illustrating the recruitment of LAT to the T-cell immunological synapse in data acquired by iPALM providing ~10 nm isotropic resolution.Introduction.

  • Journal article
    Zhuang L, Walden AT, 2017,

    Sample mean versus sample Frechet mean for combining complex Wishart matrices: a statistical study

    , IEEE Transactions on Signal Processing, Vol: 65, Pages: 4551-4561, ISSN: 1941-0476

    The space of covariance matrices is a non-Euclidean space. The matrices form a manifold which if equipped with a Riemannian metric becomes a Riemannian manifold, and recently this idea has been used for comparison and clustering of complex valued spectral matrices, which at a given frequency are typically modelled as complex Wishart-distributed random matrices. Identically distributed sample complex Wishart matrices can be combined via a standard sample mean to derive a more stable overall estimator. However, using the Riemannian geometry their so-called sample Fr´echet mean can also be found. We derive the expected value of the determinant of the sample Fr´echet mean and the expected value of the sample Fr´echet mean itself. The population Fr´echet mean is shown to be a scaled version of the true covariance matrix. The risk under convex loss functions for the standard sample mean is never larger than for the Fr´echet mean. In simulations the sample mean also performs better for the estimation of an important functional derived from the estimated covariance matrix, namely partial coherence.

  • Conference paper
    Schon C, Adams NM, Evangelou M,

    Clustering and monitoring edge behaviour in enterprise network traffic

    , IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

    This paper takes an unsupervised learning approachfor monitoring edge activity within an enterprise computernetwork. Using NetFlow records, features are gathered acrossthe active connections (edges) in 15-minute time windows.Then, edges are grouped into clusters using the k-meansalgorithm. This process is repeated over contiguous windows.A series of informative indicators are derived by examining therelationship of edges with the observed cluster structure. Thisleads to an intuitive method for monitoring network behaviourand a temporal description of edge behaviour at global andlocal levels.

  • Journal article
    Battey HS, 2017,

    Eigen structure of a new class of structured covariance and inverse covariance matrices

    , Bernoulli, Vol: 23, Pages: 3166-3177

    There is a one to one mapping between a p dimensional strictly positive definite covariancematrix Σ and its matrix logarithm L. We exploit this relationship to study thestructure induced on Σ through a sparsity constraint on L. Consider L as a randommatrix generated through a basis expansion, with the support of the basis coefficientstaken as a simple random sample of size s = s∗from the index set [p(p + 1)/2] ={1, . . . , p(p + 1)/2}. We find that the expected number of non-unit eigenvalues of Σ, denotedE[|A|], is approximated with near perfect accuracy by the solution of the equation4p + p(p − 1)2(p + 1)hlog pp − d −d2p(p − d)i− s∗ = 0.Furthermore, the corresponding eigenvectors are shown to possess only p − |Ac| nonzeroentries. We use this result to elucidate the precise structure induced on Σ and Σ−1.We demonstrate that a positive definite symmetric matrix whose matrix logarithm issparse is significantly less sparse in the original domain. This finding has importantimplications in high dimensional statistics where it is important to exploit structure inorder to construct consistent estimators in non-trivial norms. An estimator exploitingthe structure of the proposed class is presented.

  • Journal article
    Pakkanen MS, Sottinen T, Yazigi A, 2017,

    On the conditional small ball property of multivariate Lévy-driven moving average processes

    , Stochastic Processes and their Applications, Vol: 127, Pages: 749-782, ISSN: 0304-4149

    © 2016 Elsevier B.V. We study whether a multivariate Lévy-driven moving average process can shadow arbitrarily closely any continuous path, starting from the present value of the process, with positive conditional probability, which we call the conditional small ball property. Our main results establish the conditional small ball property for Lévy-driven moving average processes under natural non-degeneracy conditions on the kernel function of the process and on the driving Lévy process. We discuss in depth how to verify these conditions in practice. As concrete examples, to which our results apply, we consider fractional Lévy processes and multivariate Lévy-driven Ornstein–Uhlenbeck processes.

  • Journal article
    Zhang Q, Filippi S, Gretton A, Sejdinovic Det al., 2017,

    Large-Scale Kernel Methods for Independence Testing

    , Statistics and Computing, Vol: 28, Pages: 113-130, ISSN: 1573-1375

    Representations of probability measures in reproducing kernel Hilbert spacesprovide a flexible framework for fully nonparametric hypothesis tests ofindependence, which can capture any type of departure from independence,including nonlinear associations and multivariate interactions. However, theseapproaches come with an at least quadratic computational cost in the number ofobservations, which can be prohibitive in many applications. Arguably, it isexactly in such large-scale datasets that capturing any type of dependence isof interest, so striking a favourable tradeoff between computational efficiencyand test performance for kernel independence tests would have a direct impacton their applicability in practice. In this contribution, we provide anextensive study of the use of large-scale kernel approximations in the contextof independence testing, contrasting block-based, Nystrom and random Fourierfeature approaches. Through a variety of synthetic data experiments, it isdemonstrated that our novel large scale methods give comparable performancewith existing methods whilst using significantly less computation time andmemory.

  • Journal article
    Chandna S, Walden AT, 2016,

    A frequency domain test for propriety of complex-valued vector time series

    , IEEE Transactions on Signal Processing, Vol: 65, Pages: 1425-1436, ISSN: 1941-0476

    This paper proposes a frequency domain approachto test the hypothesis that a stationary complex-valued vectortime series is proper, i.e., for testing whether the vector time seriesis uncorrelated with its complex conjugate. If the hypothesis isrejected, frequency bands causing the rejection will be identifiedand might usefully be related to known properties of the physicalprocesses. The test needs the associated spectral matrix whichcan be estimated by multitaper methods using, say,Ktapers.Standard asymptotic distributions for the test statistic are of nouse since they would requireK→∞,but, asKincreases so doesresolution bandwidth which causes spectral blurring. In manyanalysesKis necessarily kept small, and hence our efforts aredirected at practical and accurate methodology for hypothesistesting for smallK.Our generalized likelihood ratio statisticcombined with exact cumulant matching gives very accuraterejection percentages. We also prove that the statistic on whichthe test is based is comprised of canonical coherencies arisingfrom our complex-valued vector time series. Frequency specifictests are combined using multiple hypothesis testing to give anoverall test. Our methodology is demonstrated on ocean currentdata collected at different depths in the Labrador Sea. Overallthis work extends results on propriety testing for complex-valuedvectors to the complex-valued vector time series setting.

  • Conference paper
    Rubin-Delanchy P, HEARD NA, 2016,

    Disassortativity of computer networks

    , IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

    Network data is ubiquitous in cyber-security applications. Accurately modelling such data allows discovery of anomalous edges, subgraphs or paths, and is key to many signature-free cyber-security analytics. We present a recurring property of graphs originating from cyber-security applications, often considered a ‘corner case’ in the main literature on network data analysis, that greatly affects the performance of standard ‘off-the-shelf’ techniques. This is the property that similarity, in terms of network behaviour, does not imply connectivity, and in fact the reverse is often true. We call this disassortivity. The phenomenon is illustrated using network flow data collected on an enterprise network, and we show how Big Data analytics designed to detect unusual connectivity patterns can be improved.

  • Conference paper
    Evangelou M, Adams N, 2016,

    Predictability of NetFlow data

    , IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

    The behaviour of individual devices connected to anenterprise network can vary dramatically, as a device’s activitydepends on the user operating the device as well as on all behindthe scenes operations between the device and the network. Beingable to understand and predict a device’s behaviour in a networkcan work as the foundation of an anomaly detection framework,as devices may show abnormal activity as part of a cyber attack.The aim of this work is the construction of a predictive regressionmodel for a device’s behaviour at normal state. The behaviourof a device is presented by a quantitative response and modelledto depend on historic data recorded by NetFlow.

  • Conference paper
    Whitehouse M, Evangelou M, Adams N, 2016,

    Activity-based temporal anomaly detection in enterprise-cyber security

    , IEEE International Big Data Analytics for Cybersecurity computing (BDAC'16) Workshop, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

    Statistical anomaly detection is emerging as animportant complement to signature-based methods for enterprisenetwork defence. In this paper, we isolate a persistent structurein two different enterprise network data sources. This structureprovides the basis of a regression-based anomaly detectionmethod. The procedure is demonstrated on a large public domaindata set.

  • Journal article
    Filippi S, Holmes CC, Nieto-Barajas LE, 2016,

    Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures

    , Electronic Journal of Statistics, Vol: 10, Pages: 3338-3354, ISSN: 1935-7524

    In this article we propose novel Bayesian nonparametric methods using Dirichlet Process Mixture (DPM) models for detecting pairwise dependence between random variables while accounting for uncertainty in the form of the underlying distributions. A key criteria is that the procedures should scale to large data sets. In this regard we find that the formal calculation of the Bayes factor for a dependent-vs.-independent DPM joint probability measure is not feasible computationally. To address this we present Bayesian diagnostic measures for characterising evidence against a “null model” of pairwise independence. In simulation studies, as well as for a real data analysis, we show that our approach provides a useful tool for the exploratory nonparametric Bayesian analysis of large multivariate data sets.

  • Journal article
    Lukkarinen J, Pakkanen MS, 2016,

    Arbitrage without borrowing or short selling?

    , Mathematics and Financial Economics, Vol: 11, Pages: 263-274, ISSN: 1862-9679

    We show that a trader, who starts with no initial wealth and is not allowedto borrow money or short sell assets, is theoretically able to attain positivewealth by continuous trading, provided that she has perfect foresight of future asset prices, given by a continuous semimartingale. Such an arbitrage strategy can be constructed as a process of finite variation that satisfies a seemingly innocuous self-financing condition, formulated using a pathwiseRiemann-Stieltjes integral. Our result exemplifies the potential intricacies offormulating economically meaningful self-financing conditions in continuoustime, when one leaves the conventional arbitrage-free framework.

  • Journal article
    Filippi S, Holmes C, 2016,

    A Bayesian nonparametric approach to testing for dependence between random variables

    , Bayesian Analysis, Vol: 12, Pages: 919-938, ISSN: 1931-6690

    Nonparametric and nonlinear measures of statistical dependence between pairsof random variables are important tools in modern data analysis. In particularthe emergence of large data sets can now support the relaxation of linearityassumptions implicit in traditional association scores such as correlation.Here we describe a Bayesian nonparametric procedure that leads to a tractable,explicit and analytic quantification of the relative evidence for dependence vsindependence. Our approach uses Polya tree priors on the space of probabilitymeasures which can then be embedded within a decision theoretic test fordependence. Polya tree priors can accommodate known uncertainty in the form ofthe underlying sampling distribution and provides an explicit posteriorprobability measure of both dependence and independence. Well known advantagesof having an explicit probability measure include: easy comparison of evidenceacross different studies; encoding prior information; quantifying changes independence across different experimental conditions, and; the integration ofresults within formal decision analysis.

  • Journal article
    Young GA, Lee SMS, 2016,

    Distribution of likelihood-based p-values under a local alternative hypothesis

    , Biometrika, Vol: 103, Pages: 641-652, ISSN: 1464-3510

    We consider inference on a scalar parameter of interest in the presence of a nuisance parameter, using a likelihood-based statistic which is asymptotically normally distributed under the null hypothesis. Higher-order expansions are used to compare the repeated sampling distribution, under a general contiguous alternative hypothesis, of pp-values calculated from the asymptotic normal approximation to the null sampling distribution of the statistic with the distribution of pp-values calculated by bootstrap approximations. The results of comparisons in terms of power of different testing procedures under an alternative hypothesis are closely related to differences under the null hypothesis, specifically the extent to which testing procedures are conservative or liberal under the null. Empirical examples are given which demonstrate that higher-order asymptotic effects may be seen clearly in small-sample contexts.

  • Journal article
    Bodenham DA, Adams NM, 2016,

    Continuous monitoring for changepoints in data streams using adaptive estimation

    , Statistics and Computing, Vol: 27, Pages: 1257-1270, ISSN: 1573-1375

    Data streams are characterised by a potentially unending sequence of high-frequency observations which are subject to unknown temporal variation. Many modern streaming applications demand the capability to sequentially detect changes as soon as possible after they occur, while continuing to monitor the stream as it evolves. We refer to this problem as continuous monitoring. Sequential algorithms such as CUSUM, EWMA and their more sophisticated variants usually require a pair of parameters to be selected for practical application. However, the choice of parameter values is often based on the anticipated size of the changes and a given choice is unlikely to be optimal for the multiple change sizes which are likely to occur in a streaming data context. To address this critical issue, we introduce a changepoint detection framework based on adaptive forgetting factors that, instead of multiple control parameters, only requires a single parameter to be selected. Simulated results demonstrate that this framework has utility in a continuous monitoring setting. In particular, it reduces the burden of selecting parameters in advance. Moreover, the methodology is demonstrated on real data arising from Foreign Exchange markets.

  • Journal article
    Battey H, Feng Q, Smith RJ, 2016,

    Improving confidence set estimation when parameters are weakly identified

    , Statistics and Probability Letters, Vol: 118, Pages: 117-123, ISSN: 0167-7152

    © 2016 Elsevier B.V. We consider inference in weakly identified moment condition models when additional partially identifying moment inequality constraints are available. We detail the limiting distribution of the estimation criterion function and consequently propose a confidence set estimator for the true parameter.

  • Conference paper
    Flaxman S, Sejdinovic D, Cunningham JP, Filippi Set al., 2016,

    Bayesian Learning of Kernel Embeddings

    , UAI'16

    Kernel methods are one of the mainstays of machine learning, but the problemof kernel learning remains challenging, with only a few heuristics and verylittle theory. This is of particular importance in methods based on estimationof kernel mean embeddings of probability measures. For characteristic kernels,which include most commonly used ones, the kernel mean embedding uniquelydetermines its probability measure, so it can be used to design a powerfulstatistical testing framework, which includes nonparametric two-sample andindependence tests. In practice, however, the performance of these tests can bevery sensitive to the choice of kernel and its lengthscale parameters. Toaddress this central issue, we propose a new probabilistic model for kernelmean embeddings, the Bayesian Kernel Embedding model, combining a Gaussianprocess prior over the Reproducing Kernel Hilbert Space containing the meanembedding with a conjugate likelihood function, thus yielding a closed formposterior over the mean embedding. The posterior mean of our model is closelyrelated to recently proposed shrinkage estimators for kernel mean embeddings,while the posterior uncertainty is a new, interesting feature with variouspossible applications. Critically for the purposes of kernel learning, ourmodel gives a simple, closed form marginal pseudolikelihood of the observeddata given the kernel hyperparameters. This marginal pseudolikelihood caneither be optimized to inform the hyperparameter choice or fully Bayesianinference can be used.

  • Journal article
    Schneider-Luftman D, Walden AT, 2016,

    Partial coherence estimation via spectral matrix shrinkage under quadratic loss

    , IEEE Transactions on Signal Processing, Vol: 64, Pages: 5767-5777, ISSN: 1941-0476

    Partial coherence is an important quantity derivedfrom spectral or precision matrices and is used in seismology,meteorology, oceanography, neuroscience and elsewhere. If thenumber of complex degrees of freedom only slightly exceedsthe dimension of the multivariate stationary time series, spectralmatrices are poorly conditioned and shrinkage techniques suggestthemselves.When true partial coherencies are quite large then forshrinkage estimators of the diagonal weighting kind it is shownempirically that the minimization of risk using quadratic loss(QL) leads to oracle partial coherence estimators far superiorto those derived by minimizing risk using Hilbert-Schmidt (HS)loss. When true partial coherencies are small the methods behavesimilarly. We derive two new QL estimators for spectral matrices,and new QL and HS estimators for precision matrices. In additionfor the full estimation (non-oracle) case where certain traceexpressions must also be estimated, we examine the behaviour ofthree different QL estimators, the precision matrix one seemingparticularly appealing. For the empirical study we carry outexact simulations derived from real EEG data for two individuals,one having large, and the other small, partial coherencies. Thisensures our study covers cases of real-world relevance.

  • Journal article
    Pakkanen MS, Réveillac A, 2016,

    Functional limit theorems for generalized variations of the fractional Brownian sheet

    , Bernoulli, Vol: 22, Pages: 1671-1708, ISSN: 1350-7265

    We prove functional central and non-central limit theorems for generalizedvariations of the anisotropic d-parameter fractional Brownian sheet (fBs) forany natural number d. Whether the central or the non-central limit theoremapplies depends on the Hermite rank of the variation functional and on thesmallest component of the Hurst parameter vector of the fBs. The limitingprocess in the former result is another fBs, independent of the original fBs,whereas the limit given by the latter result is an Hermite sheet, which isdriven by the same white noise as the original fBs. As an application, wederive functional limit theorems for power variations of the fBs and discusswhat is a proper way to interpolate them to ensure functional convergence.

  • Journal article
    Bakoben M, Bellotti AG, Adams NM, 2016,

    Improving clustering performance by incorporating uncertainty

    , Pattern Recognition Letters, Vol: 77, Pages: 28-34, ISSN: 1872-7344

    In more challenging problems the input to a clustering problem is not raw data objects, but rather parametric statistical summaries of the data objects. For example, time series of different lengths may be clustered on the basis of estimated parameters from autoregression models. Such summary procedures usually provide estimates of uncertainty for parameters, and ignoring this source of uncertainty affects the recovery of the true clusters. This paper is concerned with the incorporation of this source of uncertainty in the clustering procedure. A new dissimilarity measure is developed based on geometric overlap of confidence ellipsoids implied by the uncertainty estimates. In extensive simulation studies and a synthetic time series benchmark dataset, this new measure is shown to yield improved performance over standard approaches.

  • Journal article
    Nieto-Reyes A, Battey H, 2016,

    A Topologically Valid Definition of Depth for Functional Data

    , Statistical Science, Vol: 31, Pages: 61-79, ISSN: 0883-4237

    The main focus of this work is on providing a formal definition of statistical depth for functional data on the basis of six properties, recognising topological features such as continuity, smoothness and contiguity. Amongst our depth defining properties is one that addresses the delicate challenge of inherent partial observability of functional data, with fulfillment giving rise to a minimal guarantee on the performance of the empirical depth beyond the idealised and practically infeasible case of full observability. As an incidental product, functional depths satisfying our definition achieve a robustness that is commonly ascribed to depth, despite the absence of a formal guarantee in the multivariate definition of depth. We demonstrate the fulfillment or otherwise of our properties for six widely used functional depth proposals, thereby providing a systematic basis for selection of a depth function.

  • Conference paper
    Plasse J, Adams N, 2016,

    Handling Delayed Labels in Temporally Evolving Data Streams

    , 4th IEEE International Conference on Big Data (Big Data), Publisher: IEEE, Pages: 2416-2424
  • Conference paper
    Noble J, Adams NM, 2016,

    Correlation-based Streaming Anomaly Detection in Cyber-Security

    , 16th IEEE International Conference on Data Mining (ICDM), Publisher: IEEE, Pages: 311-318, ISSN: 2375-9232
  • Conference paper
    Bakoben M, Adams N, Bellotti A, 2016,

    Uncertainty aware clustering for behaviour in enterprise networks

    , 16th IEEE International Conference on Data Mining (ICDM), Publisher: IEEE, Pages: 269-272, ISSN: 2375-9232

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=218&limit=30&respub-action=search.html Current Millis: 1566351908584 Current Time: Wed Aug 21 02:45:08 BST 2019