A cheat sheet of becoming a Data Scientist for Free by Zeeshan-ul-Hassan Usmani:
- Understand Data: Data is useless and can (and should) be misleading without the context. Data needs a story to tell a story. Data is like a color that needs a surface to even prove its existence, as color red for example, can’t prove its existence without a surface, we see a red car, or red scarf, red tie, red shoes or red something, similarly data needs to be associated with its surroundings, context, methods, ways and the whole life cycle where it is born, generated, used, modified, executed and terminated. I have yet to find a “data scientist” who can talk to me about the “data” without mentioning technologies like Hadoop, NoSQL, Tableau or other sophisticated vendors and buzzwords. You need to have an intimate relationship with your data; you need to know it inside out. Asking someone else about anomalies in “your” data is equal to asking your wife how she gets pregnant. One of the distinct edge we had for our relationship with the UN and the software to secure schools form bombings is our command over the underlying data, while the world talks about it using statistical charts and figures, we are the ones back home who experience it, live it in our daily lives, the importance, details, and the appreciation of this data that we have cannot be find anywhere else. We are doing the same with our other projects and clients.
- Understand Data Scientist: Unfortunately, one of the most confused and misused word in data sciences filed is the “data scientist” itself. Someone relate it to a mystic oracle who would know everything under the sun, while others would reduce it down to statistical expert, for few its someone familiar with Hadoop and NoSQL, and for others it is someone who can perform A/B testing and can use so much mathematics and statistical terms that would be hard to understand in executive meetings. For some, it is visualization dashboards and for others it’s a never ending ETL processes. For me, a Data Scientist is someone who understands less about the science than the ones who creates it and little less about the data than the ones who generates it, but exactly knows how these two works together. A good data scientist is the one who knows what is available “outside the box” and who he needs to connect with, hire, or the technologies he needs to deploy to get the job done, one who can link business objectives with data marts, and who can simply connect the dots from business gains to human behaviors and from data generation to dollars spent.
- Listen to weekly podcasts by Partially Derivative on Data Sciences and explore their Resources page
- University of Washington’s Intro to Data Science and Computing for data analysis will be a good start
- Explore this GitHub Link and try to read as much as you can
- Check out Measure for America to gain an understanding of how data can make a difference
- Read the free book – Field Guide to Data Sciences
- Religiously follow this infographic on how to become a data scientist
- Read this blog to master your R skills
- Read this blog to master your statistics skills
- Read this wonderful practical intro to data sciences by Zipfian Academy
- Try to complete this open source data science Masters program
- Do this Machine Learning course at Coursera by the co-founder Andrew Ng of Coursera himself
- By all means, complete this Data Science Specialization on Coursera, all nine courses, and the capstone
- If you lack computer science background or want to go towards programming side of the data sciences, try to complete this Data Mining Specialization from the Coursera
- Optional: depends on the industry you like to work with, you may want to check out these industry specific courses/links on data sciences, healthcare analytics – intro and specialization, education, performance optimization and general academic research
- To understand the deployment side of data science applications, this cloud computing specialization from the Coursera and Youtube Amazon Web Services and free trainings are a must to do
- Do these second-to-none courses on Mining Massive Datasets and Process Mining
- This link will lead you to 27 best data mining books for free
- Try to read Data Science Central once a day, articles like this can save you a lot of time and discussion in interviews
- Try to compete in as many Kaggle competitions as you can
- To put a cherry on the cake, these statistics driven courses will help you in differentiation from all other applicants – Inferential Statistics, Descriptive Statistics, Data Analysis and Statistics, Passion drive stats, and Making Sense of Data
- Follow the following on Twitter for Predictive Analytics: @mgualtieri, @analyticbridge, @doug_laney, @Hypatia_LeslieA, @hyounpark, @KDnuggets, and @anilbatra
- Follow the following on Twitter for Big Data and Data Sciences: Alistair Croll, Alex Popescu, @rethinkdb, Amy Heineike, Anthony Goldbloom, Ben Lorica, @oreillymedia., Bill Hewitt, Carla Gentry CSPO, David Smith, David Feinleib, Derrick Harris, DJ Patil, Doug Laney – Edd Dumbill, Eric Kavanagh, Fern Halper, Gil Press, Gregory Piatetsky, Hilary Mason, Jake Porway, James Gingerich, James Kobielus, Jeff Hammerbacher, Jeff Kelly, Jim Harris, Justin Lovell, Kevin Weil, Krish Krishnan, Manish Bhatt, Merv Adrian, Michael Driscoll, Monica Rogati, Neil Raden, Paul Philp, Peter Skomoroch, Philip (Flip) Kromer, Philip Russom, Paul Zikopoulos, Russell Jurney, Sid Probstein, Stewart Townsend, Todd Lipcon, Troy Sadkowsky, Vincent Granville, William McKnight, Yves Mulkers
The whole list will take 3 to 12 months to complete and will cost you absolutely nothing, and I can guarantee you that with this skills set you really have to try very hard to remain jobless. Even if you complete half of it, send me a note and I will have something ready for you.
Ball is in your court, it doesn’t matter where you are and how much you can afford, if you want to make at least four times higher the average income of your countrymen, this is the way to do it, at least for next 10 years (where we will be generating 20 TBs of data per year per person versus 1 TB of data per year per person in the last 10 years.)
I will write separate articles on Data Science Books (I’ve read 127 of those in last six months) and MOOCs (I am celebrating my 25th MOOC certification today).
For everyone else data sciences is an opportunity, for me it’s a passion
I tweet at @ZeeshanUsmani
*This article is written by Zeeshan-ul-Hassan Usmani at Kaggel, if you want to read more pls visit this link. https://www.kaggle.com/getting-started/44915
No comments:
Post a Comment