What you’ll learn
Variables and datatypes
and much much more

What you’ll learn
Variables and datatypes
and much much more
5 sections • 153 lectures • 45h 2m total length
Expand all sections
spyder part 2
variables and data types
Section 2

This course includes:
45 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of completion

Welcome to this course on Python for Data Science. This is a 4 week course we are

going to teach you some very basic programming aspects in python. And since this is a

course that is geared towards data science towards another course based on what has

been taught in the course, we will also show you two different case studies one is what

we call as a function approximation case study another one a classification case study.

And then tell you how to solve those case studies using the programming platform that

you have learned. So, in this first introductory lecture I am just going to talk about why

are we looking at python for data science.

(Refer Slide Time: 01:10)

So, to look at that first we are going to look at what data science is. This is something

that you would have seen in other videos of courses in the NPTEL in other places. Data

science is basically the science of analyzing raw data and deriving insights from this

data. And you could use multiple techniques to derive insights, you could use simple

statistical techniques to derive insights, you could use more complicated and more

sophisticated machine learning techniques to derive insights and so on.

Nonetheless the key focus of data science is in actually deriving these insights using

whatever techniques that you want to use. Now there is a lot of excitement about data

science and this excitement comes because its been shown that you can get very valuable

insights, from large data and you can get insights about how different variables change

together, how one variable affects another variable and so on with large data which is not

very easy to simply see by very simple computation.

So, you need to invest some time and energy, into understanding how you could look at

this data and derive these insights from data. And from utilitarian viewpoint, if you look

at data science in industries if you do proper data science, it allows these industries to

make better decisions. These decisions could be in multiple fields for example,

companies could make better purchasing decisions, better hiring decisions, better

decisions in terms of how to operate their processes and so on.

So, when we talk about decisions, the decisions could be across multiple verticals in an

industry. And data science is not only useful from an industrial perspective it is also

useful in actual science as themselves. So, where you look at lots of data to model your

system or test your hypotheses or theories about systems and so on. So, when we talk

about data science, we start by assuming that we have a large amount of data for the

problem of interest. And we are going to basically look at this data we are going to

inspect the data, we are going to clean and curate the data then we will do some

transformation of the data modeling and so on before we can derive insights that are

valuable to the organization or to test a theory and so on.

(Refer Slide Time: 03:47)

Now, coming to a more practical viewpoint of what we do once we have data. I have

these four bullet points; which roughly tell you supposing you were solving a data

science problem what are the steps you will do? So, you will start with just having data

someone gives you data; and you are trying to derive insights from this data. So, the very

first step is really to bring this data into your system. So, you have to read the data. So,

that the data comes into this programming platform so that you can use this data. Now

data could be in multiple formats so you could have data in a simple excel sheet or some

other format.

So, we will teach you how to pull data in to your programming platform from multiple

data formats. So, that is a first step really if you think about how you are going to solve a

problem these steps would be first to simply read the data. And then once you read the

data many times you have to do some processing with this data you could have data that

that is not correct. For example, we all know that if you have your mobile numbers, there

are 10 numbers in a mobile number and if there is a column of mobile numbers and then

say there is a one row where there are just five numbers then you know there is

something wrong ok. So, this is a very simple check I am talking about in real data

processing this gets much more complicated.

So, once you bring the data in when you try to process this data you are going to get

errors such as this. So, how do you remove such errors how do you clean the data? Is one

activity that that usually precedes doing you more useful stuff with the data. This is not

the only issue that we look at there could be data that is missing.

So, for example, there is a variable for which you get a value in multiple situations, but

in some situations the value is missing. So, what do you do with this data do you throw

the record away? Or you do something to fill the data and so on. So, these are all data

processing cleaning steps. So, in this course we will tell you the tools that are available

in python so that you can do this data processing cleaning and so on.

Now what you have done at this point is you have been able to get the data into the

system, you have been able to process and clean the data and get to a certain data file or

data structure that is reasonably complete so that you think you can work with this data

set at which point what you will do is you will try to summarize this data. And usually

summarization of this data a very simple technique would be very very simple statistical

measures that you will compute; you could for example, computer median, mode, mean

of a particular column.

So, those are simple ideas or summarizing the data you could compute variance and so

on. So, we are going to teach you how to use this notions of statistical quantities that you

can use to summarize the data. Once you summarize the data then another activity which

is usually taken up is what is called visualization right. So, visualization means you look

at this data and more pictorially to get insights about the data before you bring in heavy

duty algorithms to bear on this data. And this is a creative aspect of data science, the

same data could be visualized by multiple people in multiple ways. And some

visualizations are not only I caching, but are also much more informative than other

types of visualization.

So, this notion of plotting this data so that some of the attributes are aspects of the data

are made apparent is this notion of visualization. And there are tools in python that will

teach you in terms of how you visualize this data. So, at this point you have taken the

data, you have cleaned the data, got a set of data points or data structure that you can

work with you have done some basic summary of this data that gives you some insights.

You also looked at it more visually and you have got some more insights, but when you

have large amount of data big data the last step is really deriving those insights which are

  1. not readily apparent either through visualization or through simple summary of data.

[maxbutton id=”1″ url=”https://www.udemy.com/course/python-for-data-science-and-data-analysis-masterclass-2020/?couponCode=JUN_PROMO” ]