a scatter plot chart made with plotly express

It may be a symptom of the modern world we live in, but these days I expect pretty much any content I come across on the web to be interactive or responsive in some way. Charts, graphs and data visualization in general is no exception here (in fact, its in this realm that I particularly want it!). Enter Plotly, a python library dedicated to creating clean, beautiful, interactive visualizations. As soon as I discovered Plotly I rushed to stick it into just about every project I had at hand, and was almost immediately…annoyed. My introduction to visualization in python came primarily from matplotlib and it’s stylish counterpart seaborn; trying to create charts with Plotly of the same stylistic caliber ended up turning into a bit of a time sink.

Enter Plotly Express – the far more intuitive, easily styled implementation of Plotly I wanted. Hence, this quick overview of how to create charts with Plotly Express, in the hopes you’ll be able to jump right into using Plotly without wading through the stylistic mire I did.

Import the Libraries

We don’t need much in the way of libraries to start create some beautiful graphics; sklearn and pandas are for some brief data processing, and after that it’s all plotly.express.

from sklearn import datasets
import pandas as pd

import plotly.express as px

# I'll explain the line below when we get to the styling section of the tutorial
px.defaults.template = 'simple_white'

Importing the Data

I’ll use the wine dataset from the sklearn library (since just about everyone likes to use the iris dataset) for some sample data to play with.

sklearn_wine = datasets.load_wine()
print(f'This dataset has {len(sklearn_wine.target_names)} classes')
print(f'These are the features in this dataset:\n {sklearn_wine.feature_names}')
This dataset has 3 classes
These are the features in this dataset:
 ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

Creating a Dataframe from the SKLearn Sample Data

Plotly Express is incredibly useful when combined with the well known Pandas Dataframe. Below are some quick steps to get the sample data from our SciKit Learn set into a Pandas Dataframe.

wine = pd.DataFrame(sklearn_wine.data, columns=sklearn_wine.feature_names)
wine['target_name'] = pd.Categorical.from_codes(sklearn_wine.target, sklearn_wine.target_names)
# wine['target'] = pd.Series(sklearn_wine.target)
wine.head()
wine.describe()
alcoholmalic_acidashalcalinity_of_ashmagnesiumtotal_phenolsflavanoidsnonflavanoid_phenolsproanthocyaninscolor_intensityhueod280/od315_of_diluted_winesproline
count178.000000178.000000178.000000178.000000178.000000178.000000178.000000178.000000178.000000178.000000178.000000178.000000178.000000
mean13.0006182.3363482.36651719.49494499.7415732.2951122.0292700.3618541.5908995.0580900.9574492.611685746.893258
std0.8118271.1171460.2743443.33956414.2824840.6258510.9988590.1244530.5723592.3182860.2285720.709990314.907474
min11.0300000.7400001.36000010.60000070.0000000.9800000.3400000.1300000.4100001.2800000.4800001.270000278.000000
25%12.3625001.6025002.21000017.20000088.0000001.7425001.2050000.2700001.2500003.2200000.7825001.937500500.500000
50%13.0500001.8650002.36000019.50000098.0000002.3550002.1350000.3400001.5550004.6900000.9650002.780000673.500000
75%13.6775003.0825002.55750021.500000107.0000002.8000002.8750000.4375001.9500006.2000001.1200003.170000985.000000
max14.8300005.8000003.23000030.000000162.0000003.8800005.0800000.6600003.58000013.0000001.7100004.0000001680.000000

Scatter Plots

It may be one of the easiest was to visualize data correlations, but that doesn’t mean scatter plots can’t look good. we can take a fairly basic scatter plot and spruce it up a little, creating a much richer visualization of the data.

The Basics – Scatter Plot Figure

Plotly charts are stored in a object usually called fig. The fig object can be updated or changed after it’s created. Similar to matplotlib, in order to display the plot you need to run fig.show(). Let’s create a very simple scatter plot to start:

fig = px.scatter(wine,
                 x='alcohol',
                 y='flavanoids')
fig.show()

We can use the class of wine to categorize the data by color.

fig = px.scatter(wine,
                 x='alcohol',
                 y='flavanoids',
                color='target_name')
fig.show()

Add the size argument to vary the size of the points.

fig = px.scatter(wine,
                 x='alcohol',
                 y='flavanoids',
                color='target_name',
                size='hue')
fig.show()

Getting Stylish

We’ve seen some of what Plotly Express can do (and there are so many other chart types to explore!) – before going anyfurther it’s worth going over our stylistic choices. If you’re like me, you like things to be fairly minimal. Well, Plotly makes it easy to do.

Styling Arguments

There are several stylistic arguments you pass directly to the fig object upon creation; from the Plotly documentation, they are:

  • title to set the figure title
  • width and height to set the figure dimensions
  • labels to override the default axis and legend labels behavior
  • category_orders to override the default category ordering behavior, which is to use the order in which the data appears in the input
  • Various color-related attributes such as color_continuous_scale, color_range, color_discrete_sequence and color_discrete_map set the colors used in the figure
  • template to apply a predefined style template

Adding the Style arguments

Lets throw a few examples of styling arguments into our scatter plot.

fig = px.scatter(wine,
                 x='alcohol',
                 y='flavanoids',
                color='target_name',
                size='hue',
                # Begin Styling Args here
                title='Wine Characteristics by Type',
                labels={'alcohol':'Alcohol (ABV)', 'flavanoids':'Flavanoids','target_name':'Wine Type'},
                template='ggplot2')
fig.show()

Bonus – set the Default template when importing your libraries

Nothing looks better than consistent, clean (in my opinion) charting. You can set the default template for your project by implementing px.defaults.template = 'simple_white' when you first import Plotly Express. As an extra bonus, you can customize and import your own Plotly template to really make things your own.

Styling Charts with Plotly Express

In addition to accepting style elements as arguments when creating charts with Plotly Express, you can also update an existing chart using the fig.update_layout() method.

Updating the layout to remove the x and y grid

fig.update_layout(xaxis_showgrid=False, yaxis_showgrid=False)

Styling the X and Y axes

You can pass style arguments to directly format the x and y axis, too.

fig.update_xaxes(showline=True, linewidth=1, linecolor='black', zeroline=False)
fig.update_yaxes(showline=True, linewidth=1, linecolor='black', zeroline=False)

Bonus – Scatter Matrix

This was a new one to me, but you can see off the bat how useful a scatter matrix would be for getting a sense of how your data is related. You learn something new every day!

fig = px.scatter_matrix(wine,
                        dimensions=['flavanoids', 'nonflavanoid_phenols', 'color_intensity', 'hue'],
                        color='target_name')
fig.show()