Sign in

Data Scientist at HiLabs| Enthusiastic ML practitioner | IIT Kanpur | Drama Lover | Subscribe

Tools for the cops who investigate data and extract information and trends from it.

Green Colour represents new beginnings and growth. It also signifies renewal and abundance.

Data Visualization plays a very important role in Data mining. Various data scientist spent their time exploring data through visualization. To accelerate this process we need to have a well-documentation of all the plots.

Even plenty of resources can’t be transformed into valuable goods without planning and architecture. Therefore I hope this article would provide you a good architecture of all plots and their documentation.


  1. Introduction
  2. Know your Data
  3. Distribution Plots
    a. Dist-Plot
    b. Joint Plot
    c. Pair Plot
    d. Rug Plot
  4. Categorical Plots
    a. Bar Plot
    b. Count Plot
    c. Box Plot
    d. Violin Plot
  5. Advanced Plots
    a. Strip Plot

4 things you must do before any interview.

Created by Author

There is a lot of hustle for making a good resume or CV for any post. And this hustle is justified as your resume is your first impression on the interviewer. It should be neat, clean, readable, and should contain all things you did professionally.

While I was researching how to make a good resume, I found plenty of resources available for making a good one. If you want good resume templates you may log in to overleaf. If you require action verbs for your resume, you may visit here (document from Harvard).

But the question is, is a good…

Overfitting and underfitting are very common problems and we have specified methods and tools to deal with them. Although, the basic science behind all the methods is the same, and is worth mentioning too.

Photo by Isabella and Louisa Fischer on Unsplash

The Data Science community is awarded many platforms hosting lots of predictive modeling problems. This has simplified the path for beginners to excel and attain proficiency in this field. We are not going to talk about those platforms but talk about something that will let us end our journey at training an “optimal” model. The term “optimal” here means that the accuracy of the model is similar to base accuracy.

The most common problem we face while training a model is the overfitting and the underfitting of data. We have, up to some extent, the power to control it, but…

Collected experience of IIT Kanpur students

Photo by Clem Onojeghuo on Unsplash


Data Science, Data Analysis, Business Analyst, Machine Learning, Database Engineer, Deep Learning, Natural Language Processing… you would have heard these terms. And this why you are here! This field is emerging exponentially. There are lots of opportunities, lots of things to learn and explore.

I am a graduate of the Indian Institute of Technology, Kanpur. I have a great passion for data science and recently got placed as a Data Scientist in a Healthcare Startup.

What I find before and after working in this field for 3 months is:

  1. What we prepare, as a…

Experience of IIT Kanpur, one of the prestigious colleges in India


Brief Introduction to my Background

I am a final year undergraduate at the Indian Institute of Technology, Kanpur, in the Department of Mechanical Engineering and Minors in the Department of Industrial Engineering and Management.

You may find it interesting that belonging to a core field, how I land a job as a Data Scientist.

In the campus placement season (Dec 2020), I got placed as a Data Scientist at HiLabs. HiLabs has a healthcare-focused AI solution that automatically detects data errors without human intervention. It is a combination of Big Data, AI, and medical cosmologies.

How I landed there?

The story behind how I landed as a Data Scientist…

These five obstacles may occur when you train a linear regression model on your data set.

Let's go from Yellow, the color of danger to Yellow, the color of sunshine, and happiness. (Photo by Casey Thiebeau on Unsplash)

Linear Regression is one of the most trivial machine algorithms. Interpretability and easy-to-train traits make this algorithm the first steps in Machine Learning. Being a little less complicated, Linear Regression acts as one of the fundamental concepts in understanding higher and complex algorithms.

To know what linear regression is? How we train it? How we obtain the best fit line? How we interpret it? And how we access the accuracy of fit, you may visit the following article.

After understanding the basic intuition of Linear regression, certain concepts make it more fascinating and more fun. These also provide a deep…

Relative Order Test for testing the existence of a Trend in a Time series

Time passes faster for your face than for your feet (assuming you’re standing up). Einstein’s theory of relativity dictates that the closer you are to the center of the Earth, the slower time goes — and this has been measured. At the top of Mount Everest, a year would be about 15 microseconds shorter than at sea level. (Photo by Nathan Dumlao on Unsplash)

A time series comprises four major components. A trend. A seasonal component. A cyclic component. And a stochastic/ random component.

You can have a recap of all the basics of a time series from my following article.

We extract all these components and analyze them to get information from a time series. There are lots of standard methods to extract the components from a time series.

But all these components may air may not be present in a time serious altogether. Therefore, before estimating these components, we need to first check for their existence. …

DR is one of the most critical steps of the predictive modeling problem. The world is generating a large amount of data with large dimensions. Hence it is crucial to optimize the dimensional space of the data.

Pink is a light red hue and is typically associated with love and romance. People associate the color with qualities that are often thought of as feminine, such as softness, kindness, nurturance, and compassion (Photo by Isi Parente on Unsplash)

What is Dimensionality Reduction (DR)?

Suppose you want to solve a predictive modeling problem, and for the same, you start to collect data. You would never know what exact features you want and how much data is needed. Hence, you go for the upper limit, and you collect all possible features and observations.

Consequently, you realize that you have collected a large amount of data. And, these extra features are intensifying the noise and time.

  1. Noise: There may be some feature, which model find irrelevant. Hence they are just adding noise to the model.
  2. Time: The time I am talking about is computational time. For…

Time Series (TS) is considered to be one of the less known skills in the data science space. This article is a self-starter to the concepts in TS and a lot more coming.

Photo by Curtis MacNewton on Unsplash

From the point of time we came to know that data contains trends and we can extract knowledge from it, we started collecting it. In some instances, we try to generate trends from data where the time is not so large. Hence we do not find any trend concerning time.

But now, after decades of data collection, we can find at least some patterns with respect time and this is called a Time Series analysis.

What is a Time Series?

A series of observations recorded sequentially over a while i.e. a collection of observations recorded along with the timestamp is called a Time series.


This is an introduction to the young and fast-growing field of data mining (also known as knowledge discovery from data, or KDD for short). It focuses on fundamental data mining concepts and techniques for discovering interesting patterns from data in various applications.

Source: Pixabay

The world we that we see today have automated data collection tools, databases systems, world wide web, and computerized society. This results in an explosive growth in data, from terabytes to petabytes.

We are drowning in the ocean of data but starving for knowledge.

A huge velocity, volume, and variety of data are what our new age has provided us. We have cheaper technology, mobile computing, social networking, Cloud computing which has evoked this data storm.

These are the reasons why conventional methods fade away and we need some novel methods like Data mining to process the new era of…

Aayush Ostwal

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store