Data Visualization Using Python
Dr. Muhammad Hanif
Department of Computer Science, Electrical and Space Engineering
Lulea University of Technology, Sweden
Data Scientist in 2D….
[Link]
Data Scientist in 3D….
[Link]
Data Scientist in 5D….
[Link]
Data Scientist in 5D….
[Link]
Data Scientists Responsibilities
[Link]
Why Python?
q General purpose
q IPython
q Popular and mature (both API wise and community support wise)
q Glue language (high level APIs, low level C/Fortran bindings)
q Science ecosystem (growing!)
Python’s Popularity: Widespread Knowledge and Many Tools
[Link]
Python’s Popularity: Widespread Knowledge and Many Tools
[Link]
Avoid Two Language Problem
Python’s Usage: Spread Over Whole Data Science Workflow
[Link]
One day at FB’s Data Science: A member could…
Author a multistage a)processing pipeline in Python,
design a hypothesis test, perform a b)Regression analysis
over data sample with R, design and implement an
c)algorithm for some data-intensive service in Hadoop,
or d)communicate the results of our analysis.
Jeff Hammerbacher
[Link]
Python Fits All!
Python: Tools
q Interactivity / Collaboration
o Ipython
o Jupyter
q Data Wrangling / Analysis
o Numpy
o Pandas
q Data Visualization
o Matplotlib
o Seaborn etc.
Why
Visualize
?
Visualize to Analyze
Visualize to Analyze
q Patterns q Correlation
q Trends
Make Decision based on a massive dataset
IN ONE
LOOK
Visualize
to
Discover
Interactive Visualization:
Let You Discover Information
[Link]
Visualize
to
Support
a Story
Visualize
to tell a
Story
By itself
Distribution of Global Wealth
[Link]
Visualize
to
Teach
Our brain processes
visuals 60,000 times
faster than text
[Link]
Python
Libraries
For Data
Visualization
Data Science is Getting Important for Python Community
6 out of 25 most popular libraries are for Data Science
[Link]
Science Stack is Getting Better Each Day
[Link]
Matplotlib
q Python 2D plotting library which produces
publication quality figures in a variety of
hardcopy formats and interactive environments
across platforms.
q Python forerunner library for data visualization.
q “is extremely powerful but with that power
comes complexity.”
Matplotlib
[Link]
Seaborn
q harnesses the power of matplotlib to create
beautiful charts in a few lines of code.
q The key difference is Seaborn’s default styles
and color palettes are designed to be more
aesthetically pleasing and modern.
Seaborn
[Link]
ggplot
q plotting system for Python based on R's
ggplot2 and the Grammar of Graphics.
q layer components to create a complete plot.
ggplot
Bokeh
q is also based on The Grammar of Graphics,
but unlike ggplot, it’s native to Python, not
ported over from R.
q supports streaming and real-time data.
Bokeh
[Link]
pygal
q offers interactive plots that can be embedded in
the web browser.
o Its prime differentiator is the ability to output
charts as SVGs.
q Each chart type is packaged into a method and
the built-in styles are pretty,
o it’s easy to create a nice-looking chart in a few
lines of code.
pygal
[Link]
plotly
q making interactive plots, but it offers some
charts you won’t find in most libraries, like
contour plots, dendrograms, and 3D
charts.
plotly
[Link]
geoplotlib geoplotlib
q toolbox for creating maps and plotting
geographical data.
q You can use it to create a variety of map-types,
like choropleths, heatmaps, and dot density
maps.
geoplotlib geoplotlib
[Link]