transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

doi:10.1186/1471-2105-6-156

. 2005 Jun 22:6:156.

doi: 10.1186/1471-2105-6-156.

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf R P Bininda-Emonds¹

Affiliations

PMID: 15969769
PMCID: PMC1175081
DOI: 10.1186/1471-2105-6-156

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Olaf R P Bininda-Emonds. BMC Bioinformatics. 2005.

. 2005 Jun 22:6:156.

doi: 10.1186/1471-2105-6-156.

Author

Olaf R P Bininda-Emonds¹

Affiliation

¹ Lehrstuhl für Tierzucht, Technical University of Munich, Hochfeldweg 1, 85354 Freising-Weihenstephan, Germany. [email protected]

PMID: 15969769
PMCID: PMC1175081
DOI: 10.1186/1471-2105-6-156

Abstract

Background: Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets.

Results: transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences.

Conclusion: transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs").

PubMed Disclaimer

Figures

**Figure 1**
Theoretical gain in speed from performing a translated alignment. The figure reveals there is always a performance advantage in aligning any given proportion of the protein-coding DNA sequences in a data set via their amino-acid translations with the remaining DNA sequences subsequently profile-aligned to them. The curve as shown is based on the assumption that the translated alignment is 9x faster, on average, than the respective DNA alignment; other values produce nearly identical curves of different scales.

See this image and copyright information in PMC

Cited by

Seqotron: a user-friendly sequence editor for Mac OS X.
Fourment M, Holmes EC. Fourment M, et al. BMC Res Notes. 2016 Feb 17;9:106. doi: 10.1186/s13104-016-1927-4. BMC Res Notes. 2016. PMID: 26887850 Free PMC article.
Soup to Tree: The Phylogeny of Beetles Inferred by Mitochondrial Metagenomics of a Bornean Rainforest Sample.
Crampton-Platt A, Timmermans MJ, Gimmel ML, Kutty SN, Cockerill TD, Vun Khen C, Vogler AP. Crampton-Platt A, et al. Mol Biol Evol. 2015 Sep;32(9):2302-16. doi: 10.1093/molbev/msv111. Epub 2015 May 8. Mol Biol Evol. 2015. PMID: 25957318 Free PMC article.
Molecular phylogeny of Polyneoptera (Insecta) inferred from expanded mitogenomic data.
Song N, Li H, Song F, Cai W. Song N, et al. Sci Rep. 2016 Oct 26;6:36175. doi: 10.1038/srep36175. Sci Rep. 2016. PMID: 27782189 Free PMC article.
Some novel intron positions in conserved Drosophila genes are caused by intron sliding or tandem duplication.
Lehmann J, Eisenhardt C, Stadler PF, Krauss V. Lehmann J, et al. BMC Evol Biol. 2010 May 26;10:156. doi: 10.1186/1471-2148-10-156. BMC Evol Biol. 2010. PMID: 20500887 Free PMC article.
Genomic determinants of protein evolution and polymorphism in Arabidopsis.
Slotte T, Bataillon T, Hansen TT, St Onge K, Wright SI, Schierup MH. Slotte T, et al. Genome Biol Evol. 2011;3:1210-9. doi: 10.1093/gbe/evr094. Epub 2011 Sep 16. Genome Biol Evol. 2011. PMID: 21926095 Free PMC article.

See all "Cited by" articles

References

1. Haubold B, Wiehe T. Comparative genomics: methods and applications. Naturwissenschaften. 2004;91:405–421. - PubMed
1. Wernersson R, Pedersen AG. RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003;31:3537–3539. doi: 10.1093/nar/gkg609. - DOI - PMC - PubMed
1. Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992;89:10915–10919. - PMC - PubMed
1. Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science. 1992;256:1443–1445. - PubMed
1. Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. In: Dayhoff MO, editor. Atlas of Protein Sequence Structure. Vol. 5. Washington, D.C.: National Biomedical Research Foundation; 1978. pp. 345–352.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Haubold B, Wiehe T. Comparative genomics: methods and applications. Naturwissenschaften. 2004;91:405–421. - PubMed

[2] Haubold B, Wiehe T. Comparative genomics: methods and applications. Naturwissenschaften. 2004;91:405–421. - PubMed

[3] Wernersson R, Pedersen AG. RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003;31:3537–3539. doi: 10.1093/nar/gkg609. - DOI - PMC - PubMed

[4] Wernersson R, Pedersen AG. RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003;31:3537–3539. doi: 10.1093/nar/gkg609. - DOI - PMC - PubMed

[5] Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992;89:10915–10919. - PMC - PubMed

[6] Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992;89:10915–10919. - PMC - PubMed

[7] Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science. 1992;256:1443–1445. - PubMed

[8] Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science. 1992;256:1443–1445. - PubMed

[9] Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. In: Dayhoff MO, editor. Atlas of Protein Sequence Structure. Vol. 5. Washington, D.C.: National Biomedical Research Foundation; 1978. pp. 345–352.

[10] Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. In: Dayhoff MO, editor. Atlas of Protein Sequence Structure. Vol. 5. Washington, D.C.: National Biomedical Research Foundation; 1978. pp. 345–352.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Affiliation

transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials