Professor Paula Brito

Speaker Biography

Professor Paula Brito is Associate Professor at the Faculty of Economics of the University of Porto, and member of the Artificial Intelligence and Decision Support Research Group (LIAAD) of INESC TEC, Portugal. She holds a doctorate degree in Applied Mathematics from the University Paris Dauphine, and a Habilitation in Applied Mathematics from the University of Porto.
Her current research focuses on the analysis of multidimensional complex data, known as symbolic data, for which she develops statistical approaches and multivariate analysis methodologies. She has been involved in two European research projects and coordinates the Portuguese participation in the H2020 FinTech project. Paula Brito was president of the International Association for Statistical Computing (IASC) in 2013-2015. She has been chair of COMPSTAT 2008, and is chair of the upcoming conference IFCS 2022.

 

Talk Abstract

Symbolic Data is concerned with analysing data with intrinsic variability, which is to be taken into account. In Data Mining, Multivariate Data Analysis and classical Statistics, the elements under analysis are generally individual units for which a single value is recorded for each variable – e.g., individuals, described by age, salary, education level, etc. However, when the elements of interest are classes or groups of some kind – the citizens living in given towns; car models, rather than specific vehicles – then there is variability inherent to the data. Symbolic data goes beyond the usual data representation model, considering variables whose observed values for each unit are no longer necessarily single real values or categories, but may assume the form of sets, intervals, or, more generally, distributions.

In this talk, we introduce and motivate the field of Symbolic Data Analysis, detailing the new variable types introduced to represent variability, and illustrating with some examples. We consider in particular the case of interval-valued data, i.e., where for each unit under analysis an interval of R is recorded for each variable, focusing on the parametric modelling proposed by Brito & Duarte Silva (2012). This modelling, based on the representation of each observed interval by its MidPoint and LogRange, for which Multivariate Normal and Skew-Normal distributions are assumed, then allows for multivariate parametric analysis of multidimensional interval-valued data.

Next we consider the case of numerical data described by empirical distributions, known as histogram data. We introduce alternative representations of histogram observations, observing that interval-valued data constitutes a special case of those. Linear models for such distributional variables are proposed, which rely on the representation of histograms by the associated quantile functions.

We conclude by discussing open issues and research perspectives.

 

Future DSI online seminar series

Stay up to date with future DSI seminar series by joining our mailing list, email DSI events manager Ping Huang.

Registration is now closed. Add event to calendar
See all events