Xiaoyu He is pursuing Ph.D. and her research interests include high-performance computing and bioinformatics.
Abstract
The high-throughput and increasingly affordable nature of Next-Generation Sequencing (NGS) has greatly improved our ability to detect genomic variants. Although reliable results and rapid processing are critical in biomedical research and clinical applications, the accuracy of variants detection is the core concern of researchers. Characteristics of data are the decisive factor in algorithm and parameter selection, however currently popular and widely used pipelines often lack detailed parameters, such as GATK best practices, Genome VIP and integration software may SpeedSeq, which will mislead inexperienced analyst. In addition, quality management throughout the pipeline is severely overlooked, and quality control is often only found during the raw sequencing data. Therefore, a detailed and comprehensive analysis pipeline and quality management in each step are of significant importance. In this study, we discussed and summarized how to make a choice on software and parameters depending on the data characteristics. We also clarify the current quality management strategies for Illumina technology-based sequencing data at different stages of analysis based on an idea of the gain pattern of information. Which means new information is emerging and new detail and delicate metrics should be defined to ensure accurate results as the analysis goes on.