Prof. Gorban presents methods of non-linear data modeling, based on principal manifolds and principal graphs constructed using the metaphor of elasticity (elastic principal graph approach). The structure of principal graph is learned from data by application of a topological grammar which in the simplest case leads to the construction of principal curves or trees. In order to more efficiently cope with noise and outliers a trimmed data approximation term is used to increase the robustness of the method. On several examples he shows advantages of using non-linear objects for data approximation in comparison to the linear ones. The examples are taken from comparative political science, from analysis of high-throughput data in molecular biology, from analysis of dynamical systems. Here Alexander presents ElPiGraph, a scalable and robust method (and software library) for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq and medicine for the analysis of large clinical datasets, characterized by mixed data types and missing values to astronomy, where it can be used to explore complex structures in the distribution of galaxies. Scientists developed a tool ClinTrajan for clinical trajectory analysis. ElPiGraph and ClinTrajan are in open access.
Hide player controls
Hide resume playing