If you work with massive datasets, you can use a library like Plotly, designed to handle massive datasets well.
Before Doing Data Visualization Find Datasets
Prepared Environment
1 2 3 4 $ pip install plotly==5.15 .0 $ pip install pandas $ pip install jupyterlab $ jupyter lab
Load Dataset 📘 Download World Earthquake Data From 1906-2022
head()
Shows the first n (the default is 5) rows
tail()
The “opposite” method of head() is tail()
Shows the last n (5 by default) rows of the dataframe object
info()
Prints out a concise summary of the dataframe, including information about the index, data types, columns, non-null values, and memory usage
describe()
Generates descriptive statistics, including those that summarize the central tendency, dispersion, and shape of the dataset’s distribution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 import pandas as pddf = pd.read_csv('data.csv' ) print (df.head())print (df.tail())print (df.info())print (df.describe())df['Year' ] = pd.to_datetime(df['time' ]).dt.year df["Country" ] = df["place" ].str .split(pat=',' , expand=False ).str .get(-1 )
After collection, most data requires some degree of cleaning or reformatting before it can be analyzed or used to create visualizations.
Getting Started - Plotly Line Charts Line charts are used to convey changes over time.
1 2 3 4 5 import plotly.express as pxdf = px.data.gapminder().query("country=='Canada'" ) fig = px.line(df, x="year" , y="lifeExp" , title='Life expectancy in Canada' ) fig.show()
1 2 3 4 5 import plotly.express as pxdf = px.data.gapminder().query("continent=='Oceania'" ) fig = px.line(df, x="year" , y="lifeExp" , color='country' ) fig.show()
Histogram Use a histogram to visualize the frequency distribution of a single event over a certain time period.A histogram is the graphical representation of quantitative data.
1 2 3 4 5 import plotly.express as pxdf = px.data.tips() fig = px.histogram(df, x="total_bill" ) fig.show()
Bar Charts The bar chart is the graphical representation of categorical data.
1 2 3 4 5 import plotly.express as pxlong_df = px.data.medals_long() fig = px.bar(long_df, x="nation" , y="count" , color="medal" , title="Long-Form Input" ) fig.show()
Scatter Plots If you wanted to highlight the relationship or correlations between two variables (e.g. marketing spend and revenue, or hours of weekly exercise vs. cardiovascular fitness), you could use a scatter plot to see, at a glance, if one increases as the other decreases (or vice versa).
1 2 3 4 5 6 7 import pandas as pdimport plotly.express as pxdf = pd.read_csv('data.csv' ) df['Year' ] = pd.to_datetime(df['time' ]).dt.year fig = px.scatter(df, x="Year" , y="mag" ) fig.show()
Pie chart 1 2 3 4 5 import plotly.express as pxdf = px.data.tips() fig = px.pie(df, values='tip' , names='day' ) fig.show()
Maps 1 2 3 4 5 6 7 8 9 import pandas as pdimport plotly.express as pxdf = pd.read_csv('data.csv' ) fig = px.density_mapbox(df.query("mag >= 7" ), lat='latitude' , lon='longitude' , z='mag' , radius=10 , center=dict (lat=0 , lon=180 ), zoom=0 , mapbox_style="stamen-terrain" ) fig.show()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 import pandas as pdimport plotly.express as pxdf = pd.read_csv('data.csv' ) df["Country" ] = df["place" ].str .split(pat=',' , expand=False ).str .get(-1 ) fig = px.scatter_mapbox(df.query("mag >= 7" ), lat="latitude" , lon="longitude" , hover_name="Country" , hover_data=["mag" , "depth" ], color_discrete_sequence=["red" ], zoom=3 , height=300 ) fig.update_layout(mapbox_style="open-street-map" ) fig.update_layout(margin={"r" :0 ,"t" :0 ,"l" :0 ,"b" :0 }) fig.show()
References