Study sheds light on how to improve DNA sequencing analysis methods to ID organisms

Posted May 5, 2018

Designing a bioinformatics pipeline requires researchers to make a number of decisions related to how to analyze their raw DNA sequencing data, as shown in this conceptual diagram. Because various labs have developed their own bioinformatics pipelines to identify the organisms present in an environmental sample, SCCWRP coordinated an intercalibration study to understand how these labs could adjust their bioinformatics pipeline to improve the comparability of results.

SCCWRP and its partners have completed a two-year study shedding light on best-practices approaches for analyzing raw DNA sequencing data to identify the organisms present in an environmental sample, the latest step in an ongoing effort to adapt DNA barcoding technology for routine monitoring applications.

The intercalibration study, completed in May, found that laboratories can use many of their own algorithms, data filters and reference databases to analyze raw DNA sequencing data, and still get comparable, high-quality results. Study participants recommended relatively minor adjustments to these DNA analysis protocols – known as bioinformatics pipelines – to decrease variability in the results.

Researchers already have made significant progress toward adapting DNA-based taxonomic identification methods for routine environmental monitoring, including developing methods for sample collection, sample processing and selection of genetic barcode targets. DNA barcoding has been shown to be cheaper, faster and, in some cases, more reliable than manual taxonomic identification under a microscope.

During the study, researchers focused on identifying potential challenges associated with labs using their own bioinformatics pipelines to analyze raw DNA sequencing data. Bioinformatics pipelines consist of a series of detailed steps and decision points that enable labs to rapidly process millions of reads of raw sequencing data, and ultimately make a determination about the types of organisms present in a given environmental sample.

While developing a single standardized bioinformatics pipeline for taxonomic identifications might be an eventual goal, labs need to have flexibility to continually adjust their DNA analysis methods because of the speed with which the field of bioinformatics is advancing.

Each of the five labs received the same set of raw DNA sequencing data, which was obtained from small, sediment-dwelling organisms known as benthic meiofauna in a coastal sediment sample. Some of the sediment samples were treated with a pesticide to examine whether the labs could distinguish changes in the composition of the meiofauna community.The intercalibration study involved participation from five of the world’s leading labs that have developed a bioinformatics pipeline to conduct taxonomic identifications: Australia’s Macquarie University, Canada’s University of Guelph, the U.S. Environmental Protection Agency’s Office of Research and Development, the National Oceanic and Atmospheric Administration’s Atlantic Oceanographic and Meteorological Laboratory, and SCCWRP. SCCWRP also coordinated the intercalibration exercise.

Each lab independently analyzed the raw sequencing data using its own bioinformatics analysis pipeline. The pipelines encompass steps such as error control, sequence alignment and gene annotation.

All of the participating labs were able to distinguish the meiofauna community in the pesticide-treated sediment from that of the control. Following a reconciliation step, the five labs achieved about 80% correlation in identifying the types of organisms present in the meiofauna community; there also was about 80% correlation with direct comparisons of each lab’s DNA sequences. By contrast, a traditional taxonomic identification might achieve about 60% correlation.

Researchers also examined whether the five labs could distinguish the composition of organisms living at various specific locations along two rivers that terminate at an inland wetland. Using a different DNA marker to guide their identification analyses, all five labs were able to distinguish the two rivers from each other, as well as to distinguish among individual sites along each river. The labs achieved strong correlation with direct comparisons of their DNA sequencing analyses, as well as with identifying the types of organisms present.

The participating laboratories developed multiple recommendations for decreasing variability among bioinformatics pipelines, including how sequences are identified and assigned to samples, how taxa are defined based on sequence similarity, and the importance of developing a hand-curated taxonomic identification database.

DNA-based species identification methods already are being incorporated into routine environmental sampling efforts across California. For example, the Southern California Stormwater Monitoring Coalition has begun identifying stream algae samples via DNA analysis, and the Algal Stream Condition Index – a statewide tool for scoring stream health based on the composition of the algal community – will rely on DNA to identify algal species. Thus, it is important that multiple independent laboratories can produce consistent, high-quality data after applying a few relatively minor quality-control measures.

For more information, contact Dr. Joshua Steele.


More news related to: Bioassessment, DNA Barcoding, Indices of Biotic Integrity, Microbial Water Quality Research Plan, Top News