Plotly - Basics

If you work with massive datasets,
you can use a library like Plotly, designed to handle massive datasets well.


Before Doing Data Visualization

Find Datasets

Prepared Environment

1
2
3
4
$ pip install plotly==5.15.0
$ pip install pandas
$ pip install jupyterlab
$ jupyter lab

Load Dataset

📘 Download World Earthquake Data From 1906-2022

  • head()
    • Shows the first n (the default is 5) rows
  • tail()
    • The “opposite” method of head() is tail()
    • Shows the last n (5 by default) rows of the dataframe object
  • info()
    • Prints out a concise summary of the dataframe, including information about the index, data types, columns, non-null values, and memory usage
  • describe()
    • Generates descriptive statistics, including those that summarize the central tendency, dispersion, and shape of the dataset’s distribution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import pandas as pd

df = pd.read_csv('data.csv')

# Shows the first 5 rows
print(df.head())

# Shows the last 5 rows
print(df.tail())

# Concise summary of the dataframe
print(df.info())

# Descriptive statistics
print(df.describe())

# Save as `Year` field
df['Year'] = pd.to_datetime(df['time']).dt.year

# Save as `Country` field
df["Country"] = df["place"].str.split(pat=',', expand=False).str.get(-1)

After collection, most data requires some degree of cleaning or reformatting before it can be analyzed or used to create visualizations.


Getting Started - Plotly

Line Charts

Line charts are used to convey changes over time.

1
2
3
4
5
import plotly.express as px

df = px.data.gapminder().query("country=='Canada'")
fig = px.line(df, x="year", y="lifeExp", title='Life expectancy in Canada')
fig.show()
1
2
3
4
5
import plotly.express as px

df = px.data.gapminder().query("continent=='Oceania'")
fig = px.line(df, x="year", y="lifeExp", color='country')
fig.show()

Histogram

Use a histogram to visualize the frequency distribution of a single event over a certain time period.
A histogram is the graphical representation of quantitative data.

1
2
3
4
5
import plotly.express as px

df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()

Bar Charts

The bar chart is the graphical representation of categorical data.

1
2
3
4
5
import plotly.express as px

long_df = px.data.medals_long()
fig = px.bar(long_df, x="nation", y="count", color="medal", title="Long-Form Input")
fig.show()

Scatter Plots

If you wanted to highlight the relationship or correlations between two variables (e.g. marketing spend and revenue, or hours of weekly exercise vs. cardiovascular fitness), you could use a scatter plot to see, at a glance, if one increases as the other decreases (or vice versa).

1
2
3
4
5
6
7
import pandas as pd
import plotly.express as px

df = pd.read_csv('data.csv')
df['Year'] = pd.to_datetime(df['time']).dt.year
fig = px.scatter(df, x="Year", y="mag")
fig.show()

Pie chart

1
2
3
4
5
import plotly.express as px

df = px.data.tips()
fig = px.pie(df, values='tip', names='day')
fig.show()

Maps

1
2
3
4
5
6
7
8
9
import pandas as pd
import plotly.express as px

df = pd.read_csv('data.csv')

# Draw a map after doing `mag >= 7` query
fig = px.density_mapbox(df.query("mag >= 7"), lat='latitude', lon='longitude', z='mag', radius=10,
center=dict(lat=0, lon=180), zoom=0, mapbox_style="stamen-terrain")
fig.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
import plotly.express as px

df = pd.read_csv('data.csv')

# Do pre-processing on `place` field
df["Country"] = df["place"].str.split(pat=',', expand=False).str.get(-1)

fig = px.scatter_mapbox(df.query("mag >= 7"), lat="latitude", lon="longitude",
hover_name="Country", hover_data=["mag", "depth"],
color_discrete_sequence=["red"], zoom=3, height=300)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

References