The UNCseq study aims to associate known molecular alterations with clinical outcomes in oncology and use this information to support treatment decisions through reporting of genetic profiling to clinicians. The diverse informatic requirements of this study demonstrate the breadth of Big Data Science at UNC. The LDBR identifies prospective patients by cross-referencing patient schedules with available tissue in TPF. Consent and study variables are tracked in a custom clinical database. LIMS tracks sample requests to TPF, and their transfer to the Genomic Pathology lab for processing, followed by transfer to the High Throughput Sequencing Facility.

Once sequencing is complete, LDBR permits linking the samples with the clinical database so that the tumor and paired normal can be identified for processing. The UNCseq analytical workflow proceeds in an automated fashion to generate reports presented at Molecular Tumor Board. Upon review, conference summaries are stored in the clinical database along with outcome tracking. This project has spurred development of algorithms and computational methodologies for accurate identification of DNA sequence aberrations specific to an individual cancer genome. Specifically, the ABRA algorithm was developed to provide accurate somatic SNV and indel identification to this project. Additional methods have been developed to characterize DNA copy number, identify pathogenic viruses, and discover translocations from the targeted capture sequencing results. A total of 429 patients have completed this study and 56% have been found to harbor actionable results.
The Lineberger Bioinformatics Core has enabled software functions in LDBR and CPR, which helped improve operations for the Health Registry project. CPR loads consents from the Health Registry Survivor database and Teleforms in a batch on daily basis. The LDBR then permits cohort discovery reports for Health Registry by integrating clinic schedules and ICD-9 diagnosis codes of potential patients from data feeds of Carolina Data Warehouse. Integration of operation room schedule and consent tracking by LDBR and CPR has improved the operational efficiency of Tissue Procurement Facility. The success of this system was leveraged to overcome procuring issues for multiple projects including UNCSeq.
The large-scale infrastructure, software, and systems developed to support large institutional projects such TCGA, UNCseq, and Health Registry have been extended to support basic and clinical research across UNC.
The Bioinformatics Core has continued implementing a common LIMS system across various facilities at the Cancer Center. It successfully installed and maintains LIMS in the Berg lab to support NCGENES, the Mouse Phase I unit, the Genomic Pathology Lab, the Tissue Procurement Facility and the RAM lab. This achievement has allowed numerous labs to efficiently manage the processing and analysis of thousands of samples. The workflow management tools developed for UNCseq have been broadly used to support over 20 different sequencing workflows. The workflow manager decomposes the complicated processes into manageable components for simultaneous analysis on our common compute infrastructure. The workflows are diverse and include methods to determine sequencing variation in germline cells, make mutation calls between tumor and normal cells, quantify RNA, or identify transcription factor binding sites. Many workflows within the engine result in automatic posting of results to an encrypted website specific to each lab. At this point, analysts in the bioinformatics core work with individual scientists to help achieve their research goals. The result of these collaborations can be seen in 119 peer reviewed publications with 20+ labs over the last five years where data scientists in LBC have an author level contribution.