14 Data Visualization Plots of Seaborn

Tools for the cops who investigate data and extract information and trends from it.

Aayush Ostwal
Towards Data Science

--

Green Colour represents new beginnings and growth. It also signifies renewal and abundance.

Data Visualization plays a very important role in Data mining. Various data scientist spent their time exploring data through visualization. To accelerate this process we need to have a well-documentation of all the plots.

Even plenty of resources can’t be transformed into valuable goods without planning and architecture. Therefore I hope this article would provide you a good architecture of all plots and their documentation.

Content

  1. Introduction
  2. Know your Data
  3. Distribution Plots
    a. Dist-Plot
    b. Joint Plot
    c. Pair Plot
    d. Rug Plot
  4. Categorical Plots
    a. Bar Plot
    b. Count Plot
    c. Box Plot
    d. Violin Plot
  5. Advanced Plots
    a. Strip Plot
    b. Swarm Plot
  6. Matrix Plots
    a. Heat Map
    b. Cluster Map
  7. Grids
    a. Facet Grid
  8. Regression Plots

Introduction

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

For the installation of Seaborn, you may run any of the following in your command line.

pip install seaborn
conda install seaborn

To import seaborn you can run the following command.

import seaborn as sns

Know Your Data

The data set used in these plots is famous titanic data set (Fig. 1). Hereafter the data set is represented by the variable ‘df’.

Fig. 1: Titanic Data set

Distribution Plots

These plots help us to visualize the distribution of data. We can use these plots to understand the mean, median, range, variance, deviation, etc of the data.

a. Distplot

  • Dist plot gives us the histogram of the selected continuous variable.
  • It is an example of a univariate analysis.
  • We can change the number of bins i.e. number of vertical bars in a histogram
import seaborn as sns
sns.distplot(x = df['age'], bins = 10)
Fig. 2: Distribution Plot for ‘Age’ of Passengers.
  • Here x-axis is the age and the y-axis displays frequency. For example, for bins = 10, there are around 50 people having age 0 to 10

b. Joint Plot

  • It is the combination of the distplot of two variables.
  • It is an example of bivariate analysis.
  • We additionally obtain a scatter plot between the variable to reflecting their linear relationship. We can customize the scatter plot into a hexagonal plot, where, more the color intensity, the more will be the number of observations.
import seaborn as sns
# For Plot 1
sns.jointplot(x = df['age'], y = df['Fare'], kind = 'scatter')
# For Plot 2
sns.jointplot(x = df['age'], y = df['Fare'], kind = 'hex')
Fig. 3: Joint plots between ‘Age’ and ‘Fare’
  • We can see that there no appropriate linear relation between age and fare.
  • kind = ‘hex’ provides the hexagonal plot and kind = ‘reg’ provides a regression line on the graph.

c. Pair Plot

  • It takes all the numerical attributes of the data and plot pairwise scatter plot for two different variables and histograms from the same variables.
import seaborn as sns
sns.pairplot(df)
Fig. 4: Pair Plot of the titanic Data set

d. Rug Plot

  • It draws a dash mark instead of a uniform distribution as in distplot.
  • It is an example of a univariate analysis.
import seaborn as sns
sns.rugplot(x = df['Age'])
Fig. 5: Rug Plot for ‘Age’ of Passengers

Categorical Plots

These plots help us understand the categorical variables. We can use them for both univariate and bivariate analysis.

a. Bar Plot

  • It is an example of bivariate analysis.
  • On the x-axis, we have a categorical variable and on the y-axis, we have a continuous variable.
import seaborn as sns
sns.barplot(x = df['Sex'], y = df['Fare'])
Fig. 6: Bar plot for ‘Fare’ and ‘Sex’
  • We can infer that the average fare is higher for females than males.

b. Count Plot

  • It counts the number of occurrences of categorical variables.
  • It is an example of a univariate analysis.
import seaborn as sns
sns.countplot(df['Pclass'])
Fig. 7: Count Plot for Survived and ‘P-class’.

c. Box Plot

  • It is a 5 point summary plot. It gives the information about the maximum, minimum, mean, first quartile, and third quartile of a continuous variable. Also, it equips us with knowledge of outliers.
  • We can plot this for a single continuous variable or can analyze different categorical variables based on a continuous variable.
import seaborn as sns
#For plot 1
sns.countplot(df['Pclass'])
#For plot 2
sns.boxplot(y = df['Age'], x = df['Sex'])
Fig.8: a) Box plot of ‘Age’, b) Box plot of different categories in ‘sex’ for ‘Age’

d. Violin Plot

  • It is similar to the Box plot, but it gives supplementary information about the distribution too.
import seaborn as sns
sns.violinplot(y = df['Age'], x = df['Sex'])
Fig. 9: Violin Plot between ‘Age’ and ‘Sex’

Advanced Plots

As the name suggests, they are advanced because they ought to fuse the distribution and categorical encodings.

a. Strip Plot

  • It’s a plot between a continuous variable and a categorical variable.
  • It plots as a scatter plot but supplementarily uses categorical encodings of the categorical variable.
import seaborn as sns
sns.stripplot(y = df['Age'], x = df['Pclass'])
Fig.10: Strip Plot between ‘Age’ and ‘P-class’
  • We can observe that in class 1 and class 2, children around 10 years are not present and the people having age above 60 are mostly accommodated in class 1.
  • Usually, these types of observations are used to impute missing values.

b. Swarm Plot

  • It is the combination of a strip plot and a violin plot.
  • Along with the number of data points, it also provides their respective distribution.
import seaborn as sns
sns.swarmplot(y = train['Age'], x = train['Pclass'])
Fig. 11: Swarm Plot between ‘Age’ and ‘P-class’

Matrix Plots

These are the special types of plots that use two-dimensional matrix data for visualization. It is difficult to analyze and generate patterns from matrix data because of its large dimensions. So, this makes the process easier by providing color coding to matrix data.

a. Heat Map

  • In the given raw dataset ‘df’, we have seven numeric variables. So, let us generate a correlation matrix between these seven variables.
df.corr()
Fig. 12: Correlation matrix
  • It seems very difficult to read every value even though there are only 49 values. The intricacy intensifies as we traverse towards thousands of features.
    So, let us try to implement some color coding and see how easy the interpretation becomes.
sns.heatmap(df.corr(), annot = True, cmap = 'viridis')
Fig. 13: Heat Map of the correlation matrix of the titanic data set.
  • The same matrix is now articulating more information.
  • Another very obvious example is to use heatmaps to understand the missing value patterns. In Fig. 14, the yellow dash represents a missing value, hence it makes our tasks more effortless to identify the missing values.
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')
Fig. 14: Heat Map for missing values in titanic data.

b. Cluster Map

  • If we have a matrix data and want to group some features according to their similarity, cluster maps can assist us. Once, have a look at the heat map (Fig. 13) and then look at the cluster map (Fig. 15).
sns.clustermap(tran.corr(), annot='True',cmap='viridis')
Fig. 15: Cluster map for correlation matrix of titanic data
  • The x-label and y-label are the same but they harmonized differently. That is because they are grouped according to their similarity.
  • The flow-chart like structure at the top and left describe their degree of similarity.
  • Cluster maps use Hierarchical clustering to form different clusters.

Grids

Grid plots provide us more control over visualizations and plots various assorted graphs with a single line of code.

a. Facet Grid

  • Suppose we want to plot the age distribution of males and females in all the three classes of tickets. Hence, we would be having in a total of 6 graphs.
sns.FacetGrid(train, col = 'Pclass', row = 'Sex').map(sns.distplot, 'Age')
Fig. 16: Distribution plot of ‘Age‘ for classes of ‘Sex’ and ‘P-class’
  • The Facet grids provide very clear graphs as per requirements.
  • sns.FacetGrid( col = ‘col’, row = ‘row’, data = data) provides an empty grid of all unique categories in the col and row. Later, we can use different plots and common variables for peculiar variations.

Regression Plot

This is a more advanced statistical plot that provides a scatter plot along with a linear fitting on the data.

sns.lmplot(x = 'Age', y = 'PassengerId', data = df, hue = 'Sex)
Fig. 17: Regression Plot between Age and Passenger ID for males and females. | Disclaimer: There is so the significance of regressing age and passenger id. It is just the purpose of understanding visualization.

Fig. 17 displays the linear regression fitting between Passenger ID and Age for both males and females.

Wrap Up

In this article, we have seen 14 different visualization techniques using seaborn.

I believe data visualization enhances our understanding and potential for interpreting data. It gives us more satisfying skills to represent data, impute missing values, identify outliers, detect anomalies, and a lot more.

Data Analysts are like cops that need to interrogate data and extract information via them. It is extremely necessary to have optimistic tools to do the job. Therefore, I hope this article would serve you as a tool for interrogating your data.

For the Guide for Exploratory data analysis, visit-

For such content related to data science, machine learning, programming, please visit y youtube channel.

Happy Learning!

--

--