Introduction to
Machine Learning
Agenda
• Introduction
• Basics
• Classification
• Clustering
• Regression
• Use-Cases
2
Quick
Questionnaire
How many people have heard about Machine
Learning
How many people know about Machine
Learning
How many people are using Machine
Learning
About
• subfield of Artificial Intelligence (AI)
• name is derived from the concept that it deals with
“construction and study of systems that can learn from
data”
• can be seen as building blocks to make computers
learn to behave more intelligently
• It is a theoretical concept. There are various
techniques with various implementations.
• http://en.wikipedia.org/wiki/Machine_learning
In other
words…
“A computer program is said to learn from
experience (E) with some class of tasks (T) and
a performance measure (P) if its performance
at tasks in T as measured by P improves with
E”
Terminology
• Features
– The number of features or distinct traits that can be used to
describe
each item in a quantitative manner.
• Samples
– A sample is an item to process (e.g. classify). It can be a
document, a picture, a sound, a video, a row in database or CSV
file, or whatever you can describe with a fixed set of quantitative
traits.
• Feature vector
– is an n-dimensional vector of numerical features that represent
some
object.
• Feature extraction
– Preparation of feature vector
– transforms the data in the high-dimensional space to a space of
fewer dimensions.
Let’s dig deep into
it…
What do you mean by
Apple
Learning (Training)
Features: Features: Feature
1. Color: 1. Sky s:
Radish/Red Blue 1. Yello
2. Type : Fruit 2. Logo w
3. Shape 3. Shape 2. Fruit
etc… etc… 3. Shap
e
etc…
Workflow
Categories
• Supervised Learning
• Unsupervised Learning
• Semi-Supervised
Learning
• Reinforcement
Learning
Supervised Learning
• Supervised learning is where you have input variables (x) and an
output variable (Y) and you use an algorithm to learn the mapping
function from the input to the output.
• Y = f(X)
• Supervised learning problems can be further grouped into
regression and classification problems.
• Classification: A classification problem is when the output
variable is a category, such as “red” or “blue” or “disease” and
“no disease”.
• Regression: A regression problem is when the output variable
Unsupervised Learning
• Unsupervised learning is where you only have input data (X) and
no corresponding output variables.
• The goal for unsupervised learning is to model the underlying
structure or distribution in the data in order to learn more about the
data.
• Unsupervised learning problems can be further grouped into
clustering and association problems
• Clustering: A clustering problem is where you want to
discover the inherent groupings in the data, such as grouping
customers by purchasing behavior.
• Association: An association rule learning problem is where
Semi-Supervised Learning
• Problems where you have a large amount of input data (X) and
only some of the data is labeled (Y) are called semi-supervised
learning problems.
Reinforcement Learning
• allows the machine or software agent to learn its behavior
based on feedback from the environment.
• This behavior can be learnt once and for all, or keep on adapting
as time goes by.
• Application : Energy management based on consumption
Techniques
• classification: predict class from
observations
• clustering: group observations
into “meaningful” groups
• regression (prediction): predict value
from observations
Dataset
• Visual dataset
• Text dataset
• Audio dataset
Visual Dataset
• Object Classification
• Object Detection
• Scene Recognition
• Activity Detection
• Video Captioning
• Video Summarization
Text Dataset
• Text Classification
• Text Summarization
• Question Answer System
Audio Dataset
• Speech Recognition
• Text to Speech
• Sound Classification