FLASH: fast length adjustment of short reads to improve genome assemblies
- PMID: 21903629
- PMCID: PMC3198573
- DOI: 10.1093/bioinformatics/btr507
FLASH: fast length adjustment of short reads to improve genome assemblies
Abstract
Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome.
Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.
Availability and implementation: The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash.
Contact: [email protected].
Figures







Similar articles
-
COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.Bioinformatics. 2012 Nov 15;28(22):2870-4. doi: 10.1093/bioinformatics/bts563. Epub 2012 Oct 8. Bioinformatics. 2012. PMID: 23044551
-
PEAR: a fast and accurate Illumina Paired-End reAd mergeR.Bioinformatics. 2014 Mar 1;30(5):614-20. doi: 10.1093/bioinformatics/btt593. Epub 2013 Oct 18. Bioinformatics. 2014. PMID: 24142950 Free PMC article.
-
QuorUM: An Error Corrector for Illumina Reads.PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015. PLoS One. 2015. PMID: 26083032 Free PMC article.
-
Chromosome-level hybrid de novo genome assemblies as an attainable option for nonmodel insects.Mol Ecol Resour. 2020 Sep;20(5):1277-1293. doi: 10.1111/1755-0998.13176. Epub 2020 Jun 7. Mol Ecol Resour. 2020. PMID: 32329220 Review.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by
-
Gut microbiota, serum metabolites, and lipids related to blood glucose control and type 1 diabetes.J Diabetes. 2024 Oct;16(10):e70021. doi: 10.1111/1753-0407.70021. J Diabetes. 2024. PMID: 39463013 Free PMC article.
-
Periodontal conditions and salivary microbiota are potential indicators to distinguish silicosis: an exploratory study.BMC Microbiol. 2024 Oct 28;24(1):438. doi: 10.1186/s12866-024-03594-w. BMC Microbiol. 2024. PMID: 39465426 Free PMC article.
-
Dicer-like proteins influence Arabidopsis root microbiota independent of RNA-directed DNA methylation.Microbiome. 2021 Feb 26;9(1):57. doi: 10.1186/s40168-020-00966-y. Microbiome. 2021. PMID: 33637135 Free PMC article.
-
Comprehensive identification of somatic nucleotide variants in human brain tissue.Genome Biol. 2021 Mar 29;22(1):92. doi: 10.1186/s13059-021-02285-3. Genome Biol. 2021. PMID: 33781308 Free PMC article.
-
Compatible Mycorrhizal Types Contribute to a Better Design for Mixed Eucalyptus Plantations.Front Plant Sci. 2021 Feb 12;12:616726. doi: 10.3389/fpls.2021.616726. eCollection 2021. Front Plant Sci. 2021. PMID: 33643349 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources