Abstract
With the rapid accumulation of omics-scale data as output of high throughput technologies, compactness and expressiveness of representations, coupled with algorithms efficiency, are no longer a desired feature, but rather a necessity to extract meaningful information from this data deluge. This research project proposes a synergetic design of models for data characterization and algorithms for data mining in order to infer hidden information from omics data that, on a selected number of specific contexts, will allow us to: (obj1) obtain a more meaningful and essential data description, possibly including functional or structural information carried by omics data; (obj2) consequently devise the best possible strategy to analyze the data, through the development of specialized algorithms. As a further contribution (obj3) of the project, the proposed techniques will be implemented as prototypes for bioinformatics analysis, in order to both validate the results and disseminate them to the interested communities. This aims at the attraction, in terms of innovative technological contributions, of potential industrial collaborations (Horizon2020 Priority 2-Industrial leadership).
The project will be carried out by three units: UNIPD, UNIPA, and UNICAL, gathering together complementary expertise in the manipulation of strings and graphs, and in the development of mining algorithms to discover both regularities and anomalies, essential to pursue the main goals of the project.
Quality of research is attested by the international relevance of several contributions to the areas of biological analysis and big data mining (some of which are the outcome of spontaneous collaborations among members of the team). Moreover, members in the units have ongoing collaborations with prestigious international Computer Science, Mathematics, Biology and Bioinformatics Institutes. The competencies characterizing our consortium show the potentiality of the team to drive the project beyond the state-of-the-art (as for Horizon2020 Priority 1-Excellence Science), and of a solid inter-personal framework, which is at the foundations of a fruitful collaboration.
To pursue the objectives of the project, the activities will be organized in four Work Packages:
- WP1: Compositional characterization of sequences
- WP2: Compositional characterization of networks
- WP3: Computational methods for the analysis of omics data
- WP4: Prototypes, Testing, and Experiments
The units will be mainly involved in one WP, according to their primary expertise, collaborate on specific tasks of other WPs, and participate in the realization of sw prototypes. Finally, research and innovation at the core of the project are targeted to the development of advanced computational techniques that will hopefully help biomedical investigation towards scientific breakthroughs, which are needed to tackle the urgent challenges society faces, as for Horizon2020 Priority 3-Societal Challenges, concerning Health issues.
