Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;11(2):323-339.
doi: 10.1007/s11306-014-0733-z. Epub 2014 Sep 19.

A 'rule of 0.5' for the metabolite-likeness of approved pharmaceutical drugs

Affiliations

A 'rule of 0.5' for the metabolite-likeness of approved pharmaceutical drugs

Steve O Hagan et al. Metabolomics. 2015.

Abstract

We exploit the recent availability of a community reconstruction of the human metabolic network ('Recon2') to study how close in structural terms are marketed drugs to the nearest known metabolite(s) that Recon2 contains. While other encodings using different kinds of chemical fingerprints give greater differences, we find using the 166 Public MDL Molecular Access (MACCS) keys that 90 % of marketed drugs have a Tanimoto similarity of more than 0.5 to the (structurally) 'nearest' human metabolite. This suggests a 'rule of 0.5' mnemonic for assessing the metabolite-like properties that characterise successful, marketed drugs. Multiobjective clustering leads to a similar conclusion, while artificial (synthetic) structures are seen to be less human-metabolite-like. This 'rule of 0.5' may have considerable predictive value in chemical biology and drug discovery, and may represent a powerful filter for decision making processes.

Keywords: Cheminformatics; Drug-likeness; Genome-wide metabolic reconstruction; KNIME; Metabolite-likeness; Recon 2.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Heat maps of the overall similarities between a Recon2 metabolites, b drugs and c each other. In the latter plot, the drugs lie on the X-axis and the metabolites on the Y-axis. Chemical structures were encoded using the MACCS encoding and Tanimoto distances calculated as described in Methods. The heat map representation (Eisen et al. 1998) encodes the numbers as a colour; in the present version, for ease of observation, we use ten discrete colours for the ten decades of Tanimoto similarity, with the colours chosen following the recommendations of Brewer et al. (1997) (see also http://www.colorbrewer2.org/). Also shown are hierarchical clusterings of the rows and columns (Eisen et al. 1998) using complete linkage and the default settings in the hclust function in R (Color figure online)
Fig. 1
Fig. 1
Heat maps of the overall similarities between a Recon2 metabolites, b drugs and c each other. In the latter plot, the drugs lie on the X-axis and the metabolites on the Y-axis. Chemical structures were encoded using the MACCS encoding and Tanimoto distances calculated as described in Methods. The heat map representation (Eisen et al. 1998) encodes the numbers as a colour; in the present version, for ease of observation, we use ten discrete colours for the ten decades of Tanimoto similarity, with the colours chosen following the recommendations of Brewer et al. (1997) (see also http://www.colorbrewer2.org/). Also shown are hierarchical clusterings of the rows and columns (Eisen et al. 1998) using complete linkage and the default settings in the hclust function in R (Color figure online)
Fig. 2
Fig. 2
Different structural encodings produce different drug-metabolite distances. a Cumulative plots of nearest drug-metabolite Tanimoto distances using various fingerprints. The number of drugs with a Tanimoto similarity of 0.5 or smaller is arrowed (i.e. all of those to the right, ca 90 %) have a Tanimoto similarity greater than 0.5. b Scatter plots relating the nearest Tanimoto distance to a metabolite for each drug; when the closest metabolites are the same for both encodings they are coloured red. Correlation coefficients are as given. The blue histograms represent the distributions of Tanimoto similarities for each of the encodings (scaled to fit the relevant windows). c Cumulative numbers of metabolites with a Tanimoto similarity ≥0.5 for various drugs and encodings. d The variation of the numbers of metabolites with a Tanimoto similarity ≥0.5 for all drugs using the MACCS encoding, with some of the highest labelled by name and with the chemical structure of arbekacin, the ‘most promiscuously metabolite-like’ of all, shown. e The 14 least metabolite-like drugs when using the MACCS encoding. f An assessment of part of drug-metabolite space where drugs are largely but not entirely distant from metabolites (Color figure online)
Fig. 2
Fig. 2
Different structural encodings produce different drug-metabolite distances. a Cumulative plots of nearest drug-metabolite Tanimoto distances using various fingerprints. The number of drugs with a Tanimoto similarity of 0.5 or smaller is arrowed (i.e. all of those to the right, ca 90 %) have a Tanimoto similarity greater than 0.5. b Scatter plots relating the nearest Tanimoto distance to a metabolite for each drug; when the closest metabolites are the same for both encodings they are coloured red. Correlation coefficients are as given. The blue histograms represent the distributions of Tanimoto similarities for each of the encodings (scaled to fit the relevant windows). c Cumulative numbers of metabolites with a Tanimoto similarity ≥0.5 for various drugs and encodings. d The variation of the numbers of metabolites with a Tanimoto similarity ≥0.5 for all drugs using the MACCS encoding, with some of the highest labelled by name and with the chemical structure of arbekacin, the ‘most promiscuously metabolite-like’ of all, shown. e The 14 least metabolite-like drugs when using the MACCS encoding. f An assessment of part of drug-metabolite space where drugs are largely but not entirely distant from metabolites (Color figure online)
Fig. 2
Fig. 2
Different structural encodings produce different drug-metabolite distances. a Cumulative plots of nearest drug-metabolite Tanimoto distances using various fingerprints. The number of drugs with a Tanimoto similarity of 0.5 or smaller is arrowed (i.e. all of those to the right, ca 90 %) have a Tanimoto similarity greater than 0.5. b Scatter plots relating the nearest Tanimoto distance to a metabolite for each drug; when the closest metabolites are the same for both encodings they are coloured red. Correlation coefficients are as given. The blue histograms represent the distributions of Tanimoto similarities for each of the encodings (scaled to fit the relevant windows). c Cumulative numbers of metabolites with a Tanimoto similarity ≥0.5 for various drugs and encodings. d The variation of the numbers of metabolites with a Tanimoto similarity ≥0.5 for all drugs using the MACCS encoding, with some of the highest labelled by name and with the chemical structure of arbekacin, the ‘most promiscuously metabolite-like’ of all, shown. e The 14 least metabolite-like drugs when using the MACCS encoding. f An assessment of part of drug-metabolite space where drugs are largely but not entirely distant from metabolites (Color figure online)
Fig. 3
Fig. 3
Variation of the Tanimoto similarity for a marketed drug, propranolol, with various metabolites, those with a TS of over 0.5 being labelled, and structures given for a representative set to illustrate the close chemical similarity (Color figure online)
Fig. 4
Fig. 4
Drug-metabolite clustering using the MACCS encoding and MOCK, a multiobjective clustering algorithm. a Dependence of cluster numbers as the weightings of the two main objectives are varied. The ‘knees’ at cluster numbers of 2, 3, 7, 25, 30 and 64 are marked. b Cluster membership and its distribution between drugs and metabolites for when 25 clusters are chosen. Data are ‘jittered’ in the Y direction to make them clearer (Color figure online)
Fig. 5
Fig. 5
Properties of drugs and drug fragments. a Heat map illustrating marketed drug-compound distances of 2,000 drug fragments selected randomly from a Maybridge library (the plot looks very similar for 15,000 fragments). b Heat map illustrating metabolite-compound distances of 2,000 drug fragments selected randomly from a Maybridge library (the plot looks very similar for 15,000 fragments). c Cumulative plots of nearest marketed drug-compound or marketed drug–fragment Tanimoto distances for various libraries. d Distribution of molecular weights for the various datasets used (Color figure online)
Fig. 5
Fig. 5
Properties of drugs and drug fragments. a Heat map illustrating marketed drug-compound distances of 2,000 drug fragments selected randomly from a Maybridge library (the plot looks very similar for 15,000 fragments). b Heat map illustrating metabolite-compound distances of 2,000 drug fragments selected randomly from a Maybridge library (the plot looks very similar for 15,000 fragments). c Cumulative plots of nearest marketed drug-compound or marketed drug–fragment Tanimoto distances for various libraries. d Distribution of molecular weights for the various datasets used (Color figure online)

Similar articles

Cited by

References

    1. Adams JC, et al. A mapping of drug space from the viewpoint of small molecule metabolism. PLoS Computational Biology. 2009;5:e1000474. - PMC - PubMed
    1. Altman T, Travers M, Kothari A, Caspi R, Karp PD. A systematic comparison of the MetaCyc and KEGG pathway databases. BMC Bioinformatics. 2013;14:112. - PMC - PubMed
    1. Baldi P, Nasr R. When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values. Journal of Chemical Information and Modeling. 2010;50:1205–1222. - PMC - PubMed
    1. Beisken S, Meinl T, Wiswedel B, de Figueiredo LF, Berthold M, Steinbeck C. KNIME-CDK: Workflow-driven cheminformatics. BMC Bioinformatics. 2013;14:257. - PMC - PubMed
    1. Bender A. How similar are those molecules after all? Use two descriptors and you will have three different answers. Expert Opinion on Drug Discovery. 2010;5:1141–1151. - PubMed

LinkOut - more resources