Abstract:

Representing high-dimensional biological data in the form of a graph and linking features by biological and thermodynamic laws seems to be a very promising approach to deal with overwhelming complexity of biological systems. However, one can utilise this approach only if we have information about how features and attributes are connected biologically. Here we would like to draw attention to alternative methods to represent high dimensional data in the form of the graph if a-priori we do not have established connections. First of all, correlation-prediction graphs can be used as a marker of survival and have been constructed to represent the gene methylation profiles of individuals [1]. Secondly, there is an algorithm, first described by Zanin and Bocaletti, able to establish links between parameters/nodes without any a-priori knowledge of their interactions [2] using residual distances from linear regression models constructed between every pair of analytes to construct a graph. They termed this approach a “parenclitic” network representation, from the Greek term for “deviation”. Parenclitic networks have been successfully applied to problems of the detection of key genes and metabolites in different diseases, see [3] for a review. In [4] we have applied this methodology to implement machine learning classification to identify signatures of cancer development from human DNA methylation data. Thirdly, based on the understanding that the interactions of two features (at least in biological systems) often cannot be described by a linear model, we proposed to use 2-dimensional kernel density estimation (2DKDE) to model the control distribution [5]. Finally, in [6] we have introduced a variation of paren- clitic networks, that can be called synolitic from the Greek word for “ensemble”. In principle, these networks can be considered an ensemble of classifiers in a graph form and thus are a kind of correlation network where the correlation is in the changes between two classes (e.g. disease and non-disease). These networks have been successfully used to detect age related trajectories in Down’s syndrome [7] and for prediction of survival for severely ill Covid-19 patients [8,9]. Further on, we are working now on developments of these methods in combination to next generation AI.

  1. T. Bartlett, and A. Zaikin, “Detection of Epigenomic Network Community Oncomarkers”, Annals of Applied Statistics 10, 1373-1396 (2016)
  2. Zanin M, Boccaletti S. Complex networks analysis of obstructive nephropathy data. Chaos, Interdiscip. J. Nonlinear Sci. 2011;21:033103.
  3. Zanin M, Papo D, Sousa PA, Menasalvas E, Nicchi A, Kubik E, et al. Combining complex networks and data mining: why and how. Phys Rep 2016;635:1–44.
  4. A. Karsakov, T. Bartlett, I. Meyerov, M. Ivanchenko, and A. Zaikin,  “Parenclitic network analysis of methylation data for cancer identification”, PLOS ONE 12(1), e0169661  (2017).
  5. H.J. Whitwell, O. Blyuss, J.F. Timms, and A. Zaikin, “Parenclitic networks for predicting ovarian cancer”, Oncotarget  9:32, 22717-22726 (2018).
  6. Tatiana Nazarenko, Harry James Whitwell, Oleg Blyuss, Alexey Zaikin, “Parenclitic and Synolytic networks revisited”, Frontiers in Genetics 12, 733783 (2021).
  7. Krivonosov M, Nazarenko T, Bacalini M, Franceschi C, Zaikin A, Ivanchenko M. DNA methylation changes with age as a complex system: a parenclitic network approach to a family-based cohort of patients with down syndrome. bioRxiv https://doi.org/10.1101/2020.03.10.986505.
  8. V. Demichev et al., “A proteomic survival predictor for COVID-19 patients in intensive care”, PLOS Digital Health, (2022). https://doi.org/10.1371/journal.pdig.0000007
  9. Demichev, V., Tober-Lau, P., Lemke, O., Nazarenko, T., Thibeault, C., Whitwell, H., . . A. Zaikin, …  Schmidt, S.  A time-resolved proteomic and prognostic map of COVID-19. Cell Systems 12, 780-794 (2021),  http://dx.doi.org/10.1016/j.cels.2021.05.005

Getting here