Basic tools you must know about, in the field of Data Science

Hello there,
It's been a long time since I wrote something on this blog. Anyways, I know that the probability of meeting Geoffrey Hinton and getting a good number of readers on this blog is same i.e ~0.000001

In this post, I am going to tell you about some basic tools which are must for a newbie in the field of Data Science.
Earlier I used to think that knowledge about Data Science and Machine Learning comes from watching some online courses like that of  Andrew Ng (No Hard feelings ;) ), Princeton University e.t.c. I thought that having the knowledge of some python libraries like Scipy, Numpy, ScikitLearn, models of neural network, working on TensorFlow is sufficient for someone to have a good career in the field of ML & Data Science. But all these things are just like the upper layer of the leg piece of a chicken(Sorry Vegiterians, find something relatable or yourself). Yeah, the upper layer is crispy and tasty but it is far behind the taste of the inner layer of flesh.
It is very good that you have played with the models for sentiment analysis, face recognition, data parsing, class prediction. But just using these models will bring you a few projects on GitHub, few T-shirts in Hackathons, a desk job but nothing else. If you want to innovate something, want to make something of your own, start from the basics rather than copying projects and codes from GitHub and stack-overflow. By telling so, I don't prove myself to be superior. I might belong to the same category of engineers of India who just want to get a well-paid job, a four-wheeler and marry their dream partner and go on a vacation.
I think I had the same mentality till I met some cool people in the silicon valley of India, who really aim and think for the technical future of the country.
I had the same thought that knowing about using the SVM model and the basics of it were sufficient enough until I met a man having an experience of 12 years in the field of Data Science.
My most of the python code work was done by his shortcut commands on Excel.
I talked about SVM, he explained to me the whole logic behind it. I talked about edge detection using a double differential, he explained to me the difference between the local and global properties which detect the edge. Most of his words had these common words mean, median, variance, exponential e.t.c. Then I came to know that whatever knowledge I had about data science was just like the upper layer of the leg piece.

So moral of the whole story is that if you want to pursue your interest in Data Science or ML then run behind the mathematics and statistics for it. Programming languages and the pre-trained models are just simple tools to get started with. Don't limit yourself to just using the pre-trained models for it. Have the idea of how data is collected in the first place, how classes are marked. Have the proper knowledge of reading and generating different kinds of plots. Know all the basic commands and features of Excel.

With this much, I will end this post with the famous quote of Ronald Coase -
"If you torture the data long enough, it will confess"




Comments

Popular posts from this blog

Underfitting and Overfitting

Nearest Neighbour Classifier