Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction

Due to the great advances of Next Generation Sequencing (NGS) techniques, bioinformaticians are faced with large amounts of genomic and clinical data, which are growing exponentially. A striking example is The Cancer Genome Atlas (TCGA), whose aim is to provide a comprehensive archive of biomedical data about tumors. Indeed, TCGA contains more than 15 TB of genomic and clinical data whose analysis and interpretation are posing great challenges to the bioinformatics community. In this work, we focus on combination and analysis of NGS data extracted from TCGA. In particular, we combine RNA-seq and DNA-methylation experiments and perform a supervised classification analysis. Thanks to this combination, we are able to distinguish successfully the tumoral samples from the normal ones and to extract reliable rule-based classification models that contain salient features (i.e., genes and methylated sites). These features, which are related to the investigated tumor, can be studied by domain experts in order to obtain new knowledge about cancer. Finally, our proposed combination and analysis method can be adopted with success for further studies on different data sources and NGS experiments.






    E. Cappelli, G. Felici, and E. Weitschek: Combined DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Mining, 2017 (submitted).


Eleonora Cappelli, Department of Engineering, Roma Tre University.
Emanuel Weitschek, Department of Engineering, Uninettuno International University.