HYBRID EVENT: You can participate in person at London, UK or Virtually from your home or work.
ICC 2019

Luigi Marongiu

Luigi Marongiu, Speaker at Cancer Congress
University of Heidelberg, Germany
Title : Massively parallel sequencing datasets for benchmarking virus integration tools


Virus integration is increasingly indicated in the public domain as a major risk factor in carcinogenesis. Massively parallel sequencing (MPS) has become one of the most popular tools for the identification of viral genomes in human samples and several tools have been designed to detect viral integrants using MPS data. However, there is a lack of datasets containing well defined virus integrants that can be used to test the sensitivity and specificity of such tools. I developed a group of fastq files collectively called ‘SImulated Sequences Mimicking Integration’ (SISMI) in order to benchmark bioinformatics tools designed to identify viral integration from MPS data. The SISMI files were conceived to simulate MPS analysis of human genomes containing viral integrants at exact loci and assess the accuracy of the tools. The SISMI files were constructed as follows. The human genome build GRCh38.92 was used as a base upon which selected viral sequences were placed at well defined positions following integration events reported in the literature. Background genetic variation was introduced simulating random single nucleotide mutations with EMBOSS’ mbase as well as larger structural variants. The resulting files were converted to fastq format using ART, overall simulating an Illumina HiSeq MPS analysis. The SISMI fastq files were aligned to the human reference genome with BWA MEM, deduplication was obtained with SAMBAMBA and the mapping was visualized with Integrated Genome Viewer. Analysis of structural genomic variation was done with Delly. The SISMI files were designed at different levels of complexity. The first level (SISMI0) included a single viral integration in the sequence of the human mitochondrion; this level was intended to provide a toy set for building bioinformatics pipelines and had a read cover of 30×. The second level (SISMI1) was obtained by inserting several viral sequences on chromosome 21, and was envisioned to obtain pilot data on well-defined pipelines. The insertions were intended to mimic some events hard to define using MPS analysis, such as inversions and repetitions, and had a read coverage of 100×. The third level (SISMI2) was obtained by spreading the SISMI1’s insertions on different human chromosomes. This level mimicked a complete human genome experiment and was split into two sub-sets: normal (without viral insertions) and abnormal (with viral insertions) ones, both with a read coverage of 100×.


Luigi Marongiu obtained a PhD at the University College London on a work related to the use of the Human Papillomavirus genome as a biomarker for the identification of cervical cancer lesions. He worked at the University of Cambridge (England) on nosocomial noroviral infections and at the University of Edinburgh (Scotland) on veterinary viruses. He is currently based at the University of Heidelberg, Faculty of Medicine in Mannheim (Germany) assessing the role of viral infections in the development of cancer and metastasis. He is combining wet lab with bioinformatics analysis to develop models that could predict oncogenesis.