It may be a symptom of the modern world we live in, but these days I expect pretty much any content I come across on the web to be interactive or responsive in some way. Charts, graphs and data visualization in general is no exception here (in fact, its in this realm that I particularly want it!). Enter Plotly, a python library dedicated to creating clean, beautiful, interactive visualizations. As soon as I discovered Plotly I rushed to stick it into just about every project I had at hand, and was almost immediately…annoyed. My introduction to visualization in python came primarily from matplotlib
and it’s stylish counterpart seaborn
; trying to create charts with Plotly of the same stylistic caliber ended up turning into a bit of a time sink.
Enter Plotly Express – the far more intuitive, easily styled implementation of Plotly I wanted. Hence, this quick overview of how to create charts with Plotly Express, in the hopes you’ll be able to jump right into using Plotly without wading through the stylistic mire I did.
Import the Libraries
We don’t need much in the way of libraries to start create some beautiful graphics; sklearn
and pandas
are for some brief data processing, and after that it’s all plotly.express
.
from sklearn import datasets import pandas as pd import plotly.express as px # I'll explain the line below when we get to the styling section of the tutorial px.defaults.template = 'simple_white'
Importing the Data
I’ll use the wine dataset from the sklearn
library (since just about everyone likes to use the iris
dataset) for some sample data to play with.
sklearn_wine = datasets.load_wine() print(f'This dataset has {len(sklearn_wine.target_names)} classes') print(f'These are the features in this dataset:\n {sklearn_wine.feature_names}')
This dataset has 3 classes These are the features in this dataset: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
Creating a Dataframe from the SKLearn Sample Data
Plotly Express is incredibly useful when combined with the well known Pandas Dataframe. Below are some quick steps to get the sample data from our SciKit Learn set into a Pandas Dataframe.
wine = pd.DataFrame(sklearn_wine.data, columns=sklearn_wine.feature_names) wine['target_name'] = pd.Categorical.from_codes(sklearn_wine.target, sklearn_wine.target_names) # wine['target'] = pd.Series(sklearn_wine.target) wine.head() wine.describe()
alcohol | malic_acid | ash | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity | hue | od280/od315_of_diluted_wines | proline | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 | 178.000000 |
mean | 13.000618 | 2.336348 | 2.366517 | 19.494944 | 99.741573 | 2.295112 | 2.029270 | 0.361854 | 1.590899 | 5.058090 | 0.957449 | 2.611685 | 746.893258 |
std | 0.811827 | 1.117146 | 0.274344 | 3.339564 | 14.282484 | 0.625851 | 0.998859 | 0.124453 | 0.572359 | 2.318286 | 0.228572 | 0.709990 | 314.907474 |
min | 11.030000 | 0.740000 | 1.360000 | 10.600000 | 70.000000 | 0.980000 | 0.340000 | 0.130000 | 0.410000 | 1.280000 | 0.480000 | 1.270000 | 278.000000 |
25% | 12.362500 | 1.602500 | 2.210000 | 17.200000 | 88.000000 | 1.742500 | 1.205000 | 0.270000 | 1.250000 | 3.220000 | 0.782500 | 1.937500 | 500.500000 |
50% | 13.050000 | 1.865000 | 2.360000 | 19.500000 | 98.000000 | 2.355000 | 2.135000 | 0.340000 | 1.555000 | 4.690000 | 0.965000 | 2.780000 | 673.500000 |
75% | 13.677500 | 3.082500 | 2.557500 | 21.500000 | 107.000000 | 2.800000 | 2.875000 | 0.437500 | 1.950000 | 6.200000 | 1.120000 | 3.170000 | 985.000000 |
max | 14.830000 | 5.800000 | 3.230000 | 30.000000 | 162.000000 | 3.880000 | 5.080000 | 0.660000 | 3.580000 | 13.000000 | 1.710000 | 4.000000 | 1680.000000 |
Scatter Plots
It may be one of the easiest was to visualize data correlations, but that doesn’t mean scatter plots can’t look good. we can take a fairly basic scatter plot and spruce it up a little, creating a much richer visualization of the data.
The Basics – Scatter Plot Figure
Plotly charts are stored in a object usually called fig
. The fig
object can be updated or changed after it’s created. Similar to matplotlib
, in order to display the plot you need to run fig.show()
. Let’s create a very simple scatter plot to start:
fig = px.scatter(wine, x='alcohol', y='flavanoids') fig.show()
We can use the class of wine to categorize the data by color.
fig = px.scatter(wine, x='alcohol', y='flavanoids', color='target_name') fig.show()
Add the size
argument to vary the size of the points.
fig = px.scatter(wine, x='alcohol', y='flavanoids', color='target_name', size='hue') fig.show()
Getting Stylish
We’ve seen some of what Plotly Express can do (and there are so many other chart types to explore!) – before going anyfurther it’s worth going over our stylistic choices. If you’re like me, you like things to be fairly minimal. Well, Plotly makes it easy to do.
Styling Arguments
There are several stylistic arguments you pass directly to the fig
object upon creation; from the Plotly documentation, they are:
title
to set the figure titlewidth
andheight
to set the figure dimensionslabels
to override the default axis and legend labels behaviorcategory_orders
to override the default category ordering behavior, which is to use the order in which the data appears in the input- Various color-related attributes such as
color_continuous_scale
,color_range
,color_discrete_sequence
andcolor_discrete_map
set the colors used in the figure template
to apply a predefined style template
Adding the Style arguments
Lets throw a few examples of styling arguments into our scatter plot.
fig = px.scatter(wine, x='alcohol', y='flavanoids', color='target_name', size='hue', # Begin Styling Args here title='Wine Characteristics by Type', labels={'alcohol':'Alcohol (ABV)', 'flavanoids':'Flavanoids','target_name':'Wine Type'}, template='ggplot2') fig.show()
Bonus – set the Default template when importing your libraries
Nothing looks better than consistent, clean (in my opinion) charting. You can set the default template for your project by implementing px.defaults.template = 'simple_white'
when you first import Plotly Express. As an extra bonus, you can customize and import your own Plotly template to really make things your own.
Styling Charts with Plotly Express
In addition to accepting style elements as arguments when creating charts with Plotly Express, you can also update an existing chart using the fig.update_layout()
method.
Updating the layout to remove the x and y grid
fig.update_layout(xaxis_showgrid=False, yaxis_showgrid=False)
Styling the X and Y axes
You can pass style arguments to directly format the x and y axis, too.
fig.update_xaxes(showline=True, linewidth=1, linecolor='black', zeroline=False) fig.update_yaxes(showline=True, linewidth=1, linecolor='black', zeroline=False)
Bonus – Scatter Matrix
This was a new one to me, but you can see off the bat how useful a scatter matrix would be for getting a sense of how your data is related. You learn something new every day!
fig = px.scatter_matrix(wine, dimensions=['flavanoids', 'nonflavanoid_phenols', 'color_intensity', 'hue'], color='target_name') fig.show()