Obj 1: Characterizaton of Omics Data

We will design efficient and expressive descriptions for omics data in terms of their own composition (e.g. sub-strings or -graphs), able to capture statistical and biological information, to include only necessary instances, and to be used in realistic biological settings. This is a primitive of paramount interest in several application contexts (e.g. clustering and classification, motif extraction, etc.). We will study specific classes of motifs (e.g., interesting patterns possibly allowing for some non-specifications) and models for both biological sequences and biological networks in the context of omics data.

Obj2: Analysis of Omics Data

We will provide efficient techniques for analyzing omics data by exploiting the characterizations provided to improve the data analysis processes. By allowing direct computation of HTS data in succinct form, we expect that omics data modelled by compact classes of suitable descriptors will be analyzed more efficiently and effectively. We will propose specific algorithms and/or study how to adapt classical techniques, in contexts of:

  • sequence classification
  • anomaly detection
  • network alignment

Obj3: Prototypes and Demonstrators

To validate and disseminate the results of the project in terms of both research contribution and practical impact, we will provide software prototypes and demonstrators of the proposed techniques. The related activities will be spread throughout the whole project since the implementation and validation of the proposed approaches will be necessary at various steps of our research.