Compositional approaches for the characterization and mining of omics data

The full list of publication related to the project can be found at this link. Below a summary of scientific results.

Characterization of Omics Data

A new paradigm for sequence similarity, based on an alignment-free approach considering maximal and/or average longest common matches with mismatches between sequences, has been proposed in (Apostolico, Guerra, Pizzi, 2014). It has been proved that the introduction of mismatches improves the identification of similar or divergent species in the context of phylogenetic analysis.
Entropic profiles have been extensively studied, in particular with respect to their relashionship with other models for the characterization of interesting patterns in biosequences (Parida, Rombo, Pizzi, 2014). Entropic profiles proved to be a more specific class in the context of pattern discovery (Ornamenti, Parida, Rombo, Pizzi, in preparation).
An extensive comparative study on models and algorithms for searching repetition in biological networks has been presented in (Panni, Rombo, 2014).
A model for alignment-free comparison which takes into account of quality values in NGS data has been proposed (Comin, Schimd, 2014).

Analysis of Omics Data

Design and development of the first subquadratic algorithm O(k n^2 / log n) for the computation of the longest common substring with mismatches between two sequences for application in the context of phylogenetic reconstruction. The result was later improved to O(n^2 / log n) (Apostolico, Guerra, Landau, Pizzi, Theoretical Computer Science, 2016).
Design and development of a filtering-based approach for fast computation of exact alignment free similarity measures with mismatches (Pizzi, 2015) later extended to use heuristics for further speed up (Pizzi, Algorithms for Molecular Biology, 2016).
Design and development of a linear time algorithms for the computation of raw entropic profile and of a quadratic time algorithm for the normalized entropic profiles (Comin and Antonello, 2014).
Design and development of linear time algorithms for the computation of normalized entropic profiles (Parida, Pizzi, Rombo, 2014) further extended to study implementation on different data structures and an extensive comparative analysis (Pizzi, Ornamenti, Spangaro, Rombo, Parida, submitted).

Prototypes and demostrators

All the algorithms designed within the project scopes have been implemented and tested on real biological data and the results are discussed in the related papers. The sw is available upon request to the authors.
Quick (not exhaustive) links:

Compositional approaches for the characterization and mining of omics data

Principal Investigators

Cinzia Pizzi

Simona E. Rombo

Fabio Fassetti