SIO1003
BIOINFORMATICS CONCEPTS
Semester 1 Session 2024/2025
SIO1003 | Bioinformatics Concepts
ATTENDANCE → via SPECTRUM
MARK YOUR ATTENDANCE ONLINE
WE WON’T ENTERTAIN ANYONE ASKING US TO OPEN
THE ATTENDANCE IF YOU MISSED SIGNING IN.
SO, PLEASE BE RESPONSIBLE ON YOUR OWN
ATTENDANCE
DO IT NOW!
SIO1003 | Bioinformatics Concepts
ADD YOURSELF INTO THIS WHATSAPP GROUP NOW
TOTAL STUDENTS SO FAR IN WEEK 1 = 100 STUDENTS
SIO1003 | Bioinformatics Concepts
Lectures W1-W7 Lectures W8-W14
Occ 2 Practicals (Biotech, MGM) Occ 1 Practicals (Biotech)
Dr. Nikman Adli Nor Hashim Dr. Vijayan Manickam Achari
Practical: Online Practical: Online
[email protected] [email protected] Lecture venue: Online (all) Occ 3 Practicals (SPAS)
Timetable: Dr. Farahaniza Supandi
Mon 12.00 pm – 12.50 pm (Lectures) Practical: Online
Tue 2.00 pm – 4.50 pm (Occ1 & Occ2)
Wed 2.00 pm – 4.50 pm (Occ3)
[email protected]SIO1003 | Bioinformatics Concepts
Course Learning Outcomes
Describe the basic concepts of bioinformatics
Manipulate suitable bioinformatics resources to
solve biological problems
Operate common bioinformatics software and
applications
SIO1003 | Bioinformatics Concepts
Course structure
Week 1 Course introduction. Central Dogma. Introduction to Bioinformatics
Week 2 DNA sequencing & The Human Genome Project
Week 3 Biological and Bioinformatics tools and databases. Part 1
Week 4 Biological and Bioinformatics tools and databases. Part 2 CA – Practical 1 (8%)
Week 5 Gene ontology CA – Practical 2 (8%)
Week 6 Molecular evolution
Week 7 Pairwise sequence alignment CA – Practical 3 (8%)
Mid-semester Break 25.11.2024 - 01.12.2024
Week 8 Database similarity search BLAST. Part 1 CA – Test 1 (10%)
Week 9 Database similarity search BLAST. Part 2 CA – Practical 4 (8%)
Week 10 Molecular phylogenetics
Week 11 Multiple sequence alignment CA – Practical 5 (8%)
Week 12 Introduction to structural biology and Computer-Aided Drug Design
AA – TBC (20%)
Week 13 Application and future in Bioinformatics
CA – Recorded group presentation (10%)
Week 14 Revision AA- TBC(20%)
SIO1003 | Bioinformatics Concepts
Course assessment
• 60% continuous assessment
• Week 4,5,7,9,11 – Practical reports (8% x 5 = 40%)
• Week 8 – Mid-Sem Test (10%)
• Week 13 – Recorded group presentation (10%)
• 40% Alternative assessment
• Week 13 – TBC (20%)
• Week 14 – TBC (20%)
SIO1003 | Bioinformatics Concepts
Introduction to Bioinformatics
Bioinformatics in the post-genomic era
SIO1003 | Bioinformatics Concepts
What is Bioinformatics?
• Bioinformatics is an
interdisciplinary field of science COMPUTER
SCIENCE
in which biology, computer
science, and information
ENGINEERING CHEMISTRY
technology merge to form a
single discipline
• … Bioinformatics is a hybrid BIOINFORMATICS
of biology and computer MATHEMATICS BIOCHEMISTRY
science
• … Bioinformatics is computer
aided biology!
STATISTICS BIOLOGY
SIO1003 | Bioinformatics Concepts
Definition
• Historically, the term bioinformatics did not mean what it means today.
• Paulien Hogeweg and Ben Hesper initially coined the term “bioinformatica” in 1970 to refer
to the study of information processes in biotic systems. This definition placed bioinformatics
as a field parallel to biochemistry (the study of chemical processes in biological systems).
SIO1003 | Bioinformatics Concepts
Definition
Today’s definition:
“Bioinformatics is the research, development, or application of computational tools and
approaches for expanding the use of biological, medical, behavioral or health data, including
those tools and approaches to acquire, store, organize, archive, analyze or visualize such data”
Computer based management and analysis of biological and biomedical data with useful
applications in many disciplines, particularly genomics, proteomics, metabolomics, etc…
SIO1003 | Bioinformatics Concepts
More definitions..
“Bioinformatics is conceptualizing biology in terms of macromolecules and then applying
"informatics" techniques (derived from disciplines such as applied maths, computer science, and
statistics) to understand and organize the information associated with these molecules, on a
large-scale.”
Luscombe NM, et al. Methods Inf Med. 2001;40:346.
“Bioinformatics is a subdiscipline of biology and computer science concerned with the
acquisition, storage, analysis, and dissemination of biological data, most often DNA and amino
acid sequences. Bioinformatics uses computer programs for a variety of applications, including
determining gene and protein functions, establishing evolutionary relationships, and predicting
the three-dimensional shapes of proteins.”
National Institutes of Health (NIH)
Key point: Bioinformatics is Computer Aided Biology
SIO1003 | Bioinformatics Concepts
Where did Bioinformatics come from?
• Bioinformatics arose as molecular biology began to be transformed by the emergence of
molecular sequence and structural data.
Computational alignment of experimentally 3D structure of
determined sequences of a class of related proteins hemoglobin
• Because bioinformatics depends on the collection and availability of biological data, the
question that emerges is why is there is so much interest in the storage, retrieval and
analysis of this data.
SIO1003 | Bioinformatics Concepts
Various types of Bioinformatics data
Protein
families, Protein
Genomes
motifs, interaction
domains
Gene Chemical
expressions entities
DNA and
Protein
RNA Systems
sequence
sequence
Protein
Pathways
structure
DNA and
RNA Ontologies Literatures
structure
SIO1003 | Bioinformatics Concepts
Recap: The key dogmas of molecular biology
• DNA sequence determines protein sequence.
• Protein sequence determines protein structure.
• Protein structure determines protein function.
• Regulatory mechanisms (e.g. gene expression) determine the amount of a particular
function in space and time.
• Bioinformatics is now essential for the archiving, organization and analysis of data related to
these processes.
SIO1003 | Bioinformatics Concepts
“The Central Dogma” Francis Crick, 1957
• Genetic Information Flow:
• The central dogma of molecular biology is an explanation of the flow of genetic
information within a biological system.
• It is often stated as "DNA makes RNA, and RNA makes protein"
SIO1003 | Bioinformatics Concepts
DNA mRNA Polypeptide
5’| |3’ |5’
“Basic” central dogma C---G C
Amino
terminus
G---C G Arg
T---A U
G---C G
DNA replication G---C G Gly
(DNA -> DNA) A---T A
DNA Polymerase T---A U
A---T A Tyr
Genome DNA C---G C
A---T A
Transcription C---G C Thr
(DNA -> RNA) T---A U
RNA Polymerase T---A U
T---A U Phe
T---A U
Transcriptome (+) Sense RNA G---C G
C---G C Ala
Translation C---G C
(RNA -> Protein) G---C G
T---A U Val
Ribosome T---A U Carboxy
Proteome 3’| |5’ |3’ terminus
Protein
Template strand
SIO1003 | Bioinformatics Concepts
“Unusual” central dogma
DNA replication
(DNA -> DNA)
DNA Polymerase
DNA
Reverse transcription Transcription
(RNA -> DNA) (DNA -> RNA)
Reverse Transcriptase RNA Polymerase
(+) Sense RNA (-) Sense RNA
Translation
RNA replication
(RNA -> Protein)
(RNA -> RNA)
Ribosome
DNA Dependent RNA Polymerase
Protein
SIO1003 | Bioinformatics Concepts
Genomes (genetics?)
• The genome of an organism – collection of DNA within that organism, including the set of
genes that encode RNA molecules and proteins
• 1st complete genome of a free-living organism
• 1995- bacterium Haemophilus influenzae
• Publicly available databanks now have more than 6 trillions of sequence data
• These have been collected from over 450,000 different species of organisms
• DNA sequencing technologies
• 1970 – Sanger sequencing
• 2005 – next generation sequencing
• Analyze, store, distribute, acquire of data
SIO1003 | Bioinformatics Concepts
SIO1003 | Bioinformatics Concepts
Post-genomic era
• Omics technologies have transformed molecular biology into a data-rich discipline by
enabling scientists to simultaneously measure large numbers of molecular components that
operates simultaneously through a network of interactions to generate cellular functions
and phenotypic states
• Extraction of this knowledge is not easy
• Incompleteness of data
• Variability between experimental platforms
• Multiple hypothesis testing with few replicates
• Functional genomics – look into gene functioning in the body
• Personalised medicine – develop tailored treatments based on unique genetic makeup
SIO1003 | Bioinformatics Concepts
Genome & Genomics
Genome
• Complete genetic information in an organism (incl. ALL organs, tissues, cells, genes, nc,
variants)
• Eukaryotes can have 2/3 genomes:
• Nuclear genome (usually referred as, if not specified)
• Mitochondrial genome
• Plastid genome
Genomics
• The study of genomes, incl. large chromosomal segments containing many genes
• Aims:
• to map and sequence the entirety of a genome (general)
• To deduce information about the functions of DNA sequences (functional genomics)
SIO1003 | Bioinformatics Concepts
-Omics
“Omics is a general term for a broad discipline of science and engineering for
analyzing the interactions of biological information objects in various ‘omes’.”
• The main focus is on:
1. Mapping information objects such as genes, proteins, and
ligands
2. Finding interaction relationships among the objects
3. Engineering the networks and objects to understand and
manipulate the regulatory mechanisms
4. Integrating various omes and omics subfields
–The Omics Wiki
SIO1003 | Bioinformatics Concepts
-Omics studies
• Genomics- The study of the structure, function and expression of all the genes in an
organism
• Transcriptomics- study of transcriptomes, their structure and functions
• Proteomics- The large-scale study of proteins, including their structure and function,
within a cell/system/organism.
• Metabolomics- The study of global metabolite profiles in a system (cell, tissue or
organism) under a given set of conditions
• Epigenomics, Interactomics, Phenomics, Lipidomics, Fluxomics, …
SIO1003 | Bioinformatics Concepts
Why do we need Bioinformatics?
• Bioinformatics is necessitated by the rapidly
expanding quantities and complexity of biomolecular
data
• Bioinformatics provides methods for the efficient:
• storage
• annotation
• search and retrieval
• data integration
• data mining and analysis
Bioinformatics is essential for the archiving, organization and analysis of data
from sequencing, structural genomics, microarrays, proteomics and new high
throughput assays.
SIO1003 | Bioinformatics Concepts
How do we do Bioinformatics?
A “bioinformatics approach” involves the application of computer algorithms, computer
models and computer databases with the broad goal of understanding the action of both
individual genes, transcripts, proteins and large collections of these entities
SIO1003 | Bioinformatics Concepts
How do we actually do Bioinformatics?
• Pre-packaged tools and databases
• Many online
• New tools and time-consuming methods frequently require downloading
• Most are free to use
• Tool development
• Mostly on a UNIX environment
• Knowledge of programing languages frequently required (Python, Perl, R, C Java,
Fortran)
• May require specialized or high-performance computing resources…
SIO1003 | Bioinformatics Concepts
Skepticism & Bioinformatics
• We have to approach computational results the same way we do wet-lab results:
• Do they make sense?
• Is it what we expected?
• Do we have adequate controls, and how did they come out?
• Modeling is modeling, but biology is different...
• What does this model actually contribute? (to the biological function)
• Avoid the miss-use of ‘black boxes’
• Replicability is a cornerstone of scientific research
SIO1003 | Bioinformatics Concepts
Challenges in Bioinformatics
• Explosion of information
• Need a faster, automated analysis to process large amount of data
• Need for integration between different types of information (sequence, literature,
annotations, proteins levels, RNA levels, etc)
• Need for a “smarter” software to identify interesting relationships in very large datasets
• Lack of Bioinformatician/Bioinformaticist
• Software needs to be easier to access, use and understand
• Biologist need to learn about the software, its limitations and how to interpret its results
SIO1003 | Bioinformatics Concepts
Challenges in Bioinformatics
• Confusing multitude of tools available
• Each with many options and settable parameters
• Most tools and databases are written by and for nerds
• Same is true of documentation - if any exists!
• Most are developed independently
Notable exceptions are found at the:
• EBI (European Bioinformatics Institute) and
• NCBI (National Center for Biotechnology Information)
SIO1003 | Bioinformatics Concepts
Bioinformatics research areas
• Include but are not limited to:
• Organization, classification, dissemination and analysis of biological and biomedical data
(particularly ‘-omics' data)
• Biological sequence analysis and phylogenetic
• Genome organization and evolution
• Regulation of gene expression and epigenetics
• Biological pathways and networks in healthy & disease states
• Protein structure prediction from sequence
• Modeling and prediction of the biophysical properties of biomolecules for binding
prediction and drug design
• Design of biomolecular structure and function
…With applications to Biology, Medicine, Agriculture and Industry
SIO1003 | Bioinformatics Concepts
SUMMARY
• Bioinformatics is computer aided biology.
• Bioinformatics deals with the collection, archiving, organization, and interpretation of a
wide range of biological data.
SIO1003 | Bioinformatics Concepts