Imperial College London

DrIoannaPapatsouma

Faculty of Natural SciencesDepartment of Mathematics

Senior Teaching Fellow in Statistics
 
 
 
//

Contact

 

i.papatsouma

 
 
//

Location

 

533Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

3 results found

Costa E, Papatsouma I, Markos A, 2023, Benchmarking distance-based partitioning methods for mixed-type data, Advances in Data Analysis and Classification, Vol: 17, Pages: 701-724, ISSN: 1862-5347

Clustering mixed-type data, that is, observation by variable data that consist of both continuous and categorical variables poses novel challenges. Foremost among these challenges is the choice of the most appropriate clustering method for the data. This paper presents a benchmarking study comparing eight distance-based partitioning methods for mixed-type data in terms of cluster recovery performance. A series of simulations carried out by a full factorial design are presented that examined the effect of a variety of factors on cluster recovery. The amount of cluster overlap, the percentage of categorical variables in the data set, the number of clusters and the number of observations had the largest effects on cluster recovery and in most of the tested scenarios. KAMILA, K-Prototypes and sequential Factor Analysis and K-Means clustering typically performed better than other methods. The study can be a useful reference for practitioners in the choice of the most appropriate method.

Journal article

Papatsouma I, Farmakis N, 2020, Approximating symmetric distributions via sampling and coefficient of variation, Communications in Statistics - Theory and Methods, Vol: 49, Pages: 61-77, ISSN: 0361-0926

The Coefficient of Variation is one of the most commonly used statistical tool across various scientific fields. This paper proposes a use of the Coefficient of Variation, obtained by Sampling, to define the polynomial probability density function (pdf) of a continuous and symmetric random variable on the interval [a, b]. The basic idea behind the first proposed algorithm is the transformation of the interval from [a, b] to [0, b-a]. The chi-square goodness-of-fit test is used to compare the proposed (observed) sample distribution with the expected probability distribution. The experimental results show that the collected data are approximated by the proposed pdf. The second algorithm proposes a new method to get a fast estimate for the degree of the polynomial pdf when the random variable is normally distributed. Using the known percentages of values that lie within one, two and three standard deviations of the mean, respectively, the so-called three-sigma rule of thumb, we conclude that the degree of the polynomial pdf takes values between 1.8127 and 1.8642. In the case of a Laplace (μ, b) distribution, we conclude that the degree of the polynomial pdf takes values greater than 1. All calculations and graphs needed are done using statistical software R.

Journal article

Papatsouma I, Mahmoudvand R, Farmakis N, 2019, Evaluating the goodness of the sample coefficient of variation via discrete uniform distribution, Statistics, Optimization & Information Computing, Vol: 7, Pages: 642-652, ISSN: 2311-004X

Discrete uniform distribution (DUD) is one of the simplest probability models, but it is now introduced asthe main tool for the evaluation of resampling techniques which are rapidly entering data analysis and discovering usefulinformation for the researchers. In this paper we evaluate whether the sample coefficient of variation (CV) is a good estimatorfor the population CV, when the random variable (r.v.) follows the DUD. A method is proposed to obtain the percentage ofthe number of samples where the CV lies within the bounds of the corresponding population CV and this value is used as ameasure of goodness. Samples both with replacement and without replacement are examined, indicating that the goodnessof the sample CV estimator increases with the sample size. The overall study gives a good idea of whether the sample CVis generally a good estimator. A real-life data set is analyzed to demonstrate the applicability of the proposed method inpractice and the results are interpreted.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=01041949&limit=30&person=true