10 Steps for people who want to pursue a career in Data Science.
Everyone keeps talking about Data Science being the “sexiest job of the 21st century”. But is it …really?
NO!… (Sorry to disappoint)… It’s not sexy …if you’re not in love with “Data” and… not a “data-driven” decision-based person.
Data Science is a blend of maths, statistics and computer science. It also involves programming, database management and modelling. Without a good grounding in several of these you are unlikely to succeed, If you are giving up too easily.
Step 1. Satisfy the necessities
Before you start, ask yourself this question
“Am I good with Maths? Specially Multivariable Calculus and Linear Algebra?”
If yes, BRAVO! If not, That’s the first thing you have to do today. If your math background is up to multivariable calculus and linear algebra, you’ll have enough background to understand almost all of the probability /statistics/ machine learning for the job.
Here are some useful links for you to follow:
- Multivariable Calculations: Multivariable Calculus learning Recommendation (Multivariable calculations are pretty much useful when it comes to probability)
- Linear Algebra: Enroll in the Harvard EDX course. (This is basically the foundation base of the Machine Learning so it is better for you to be master at Linear Algebra)
You also need some programming background to begin. Programming is not as much hard like it sounds. It’s the coolest subject in the world to learn.
Personally, I prefer Python for Data Scientists. But why Python is so Important right? Check out..
How do I learn to Code?
You can learn to code in so many different ways now that we have something called the internet. (I’ll leave it to you)
How do I learn Python?
Here some (a bit long) list of resources:
- DataCamp Intro to Python for Data Science — Intro to Python for Data Science
- The Complete Python Masterclass: Learn Python From Scratch
- Learn Python — Best Python Tutorials
- Learn Python for Data Science — Dataquest
- Collection of 53 Free Python books — Python Programming Books [ click free. ]. Includes all the books mentioned below.
- Python: Learn Python in One Day and Learn It Well
- Codecademy: Python
- Python Step by Step: Build a Data Analysis Program (Disclosure: Added by author)
- Learning Python, 5th Edition
- Learn Python The Hardway (http://learnpythonthehardway.org/)
- Python: The Essential Reference (http://www.informit.com/store/pr...)
- How to Think like a Computer Scientist (http://greenteapress.com/thinkpy...)
- Learning Python — 4th Edition (http://www.rmi.net/~lutz/about-l...)
- Byte of Python (http://www.swaroopch.org/notes/P...)
- Beginning Python (http://www.apress.com/9781590599822)
- The Python Standard Library by example (The Python Standard Library By Example)
- Python in a nutshell (http://shop.oreilly.com/product/...)
- Head First Python
- Core Python Programming (http://corepython.com/)
- MIT’s introductory course (Introduction to Computer Science and Programming)
- Google for Education Python course: Google’s Python Class
- Automate the Boring Stuff with Python: Practical Programming for Total Beginners
- Data Science from Scratch: First Principles with Python
- Learning to Program Using Python, 2nd Edition
- JavaTpoint is the best resources to learn Online Python Tutorial for beginners.
- http://www.learnbay.in — Online Instructor led Training in Python Basics/Advance
Python Kindle Guide:
- Python Tricks: A Buffet of Awesome Python Features
- Python (2nd Edition): Learn Python in One Day and Learn It Well. Python for Beginners with Hands-on Project. (Learn Coding Fast with Hands-On Project Book 1)
- Learning Python: Powerful Object-Oriented Programming
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Why is Python a language of choice for data scientists?
Let’s measure the pros and cons.
Is Python the most important programming language to learn for aspiring data scientists and data miners?
Read this full answer on Quora. and this Kindle Book
After mastering python you can move into learning R as well since it’s widely used by Statisticians since R is powerful for dedicated Statistical tasks but Python is more versatile as it will connect you more to production level work.
Step 2. Plug Yourself Into the Community
As I mentioned in my previous article, You are nothing without the network. Install Meetup and Eventbrite to see data science-related events near you and keep some time during your week to attend one or two networking events. learn about data science live, and meet data scientists and other aspirational data scientists.
If you really want to know the science behind networking and the importance of building official relationships, Read this book.
Also, Start reading data science blogs and following influential data scientists. Here’s one for example:
What are the best, insightful blogs about data, including how businesses are using data?
( This is important. Knowledge of How business is using data is a nice-have skill for a data scientist. )
Jeff Hammerbacher from Quora has answered to this question like no other can and I quote:
Popular Culture
- Nate Silver: http://www.fivethirtyeight.com
- Carl Bialik: http://blogs.wsj.com/numbersguy/
- Freakonomics: http://freakonomics.blogs.nytime...
Databases and Data Infrastructure
- Curt Monash blogs at http://www.dbms2.com and primarily discusses software for data management.
- Tony Bain writes in a similar vein at http://www.tonybain.com.
- Dan Abadi is a top young database researcher and blogs at http://dbmsmusings.blogspot.com.
- Joe Hellerstein, one of the most accomplished database researchers out there, blogs at http://databeta.wordpress.com.
- Mike Stonebraker, David DeWitt, Sam Madden, and some of the other professors behind the Vertica technology blog at http://databasecolumn.vertica.com.
- Donald Feinberg is one of the most respected analysts in the field; he blogs for Gartner at http://blogs.gartner.com/donald-....
- James Kobelius holds a similar position at Forrester; he blogs at http://blogs.forrester.com/infor....
- Merv Adrian is an independent analyst with a deep knowledge of the space. He blogs at http://mervadrian.wordpress.com.
Here’s the best A-Z Ultimate Guide to Data Warehousing
The DW beginner book you’ve been searching for!Jayasekara.Blog
Machine Learning and Data Mining
- John Langford, who works at Yahoo! Research NYC, blogs at http://hunch.net.
- Olivier Bousquet, one of my favorite machine learning researchers, has a (dead) blog at http://ml.typepad.com.
- Mike Driscoll blogs at http://dataspora.com/blog.
- Gordon Linoff and Michael Berry blog at http://www.data-miners.com/blog.
It’s always good to know an ML or deep learning algorithm in which you can impress your friends.
Detecting faces in a photograph is easily solved by humans, although has historically been challenging for computers…Jayasekara.Blog
Face detection is a computer technology that is being applied for many different applications that require the…Jayasekara.blog
Data Visualization
- Visual Complexity: http://www.visualcomplexity.com/...
- Jeff Heer: http://jheer.org/blog
Random Updates
- http://www.bigdatalove.com: Chris Smith in Utah
What is your source of machine learning and data science news? Why?
Here are some data science news aggregators:
- /r/datascience — Data science subreddit
- Towards Data Science
- DataTau — HackerNews for data science (started December 2013)
- Data Science Weekly
Step 3. Setup and learn to use your tools
Python
- Install Python, iPython, and related libraries (guide)
- Install Jupyter / Anaconda
- Watch the above-mentioned tutorials and Python Data Science courses.
R
R and Python Collaboration
This is somewhere similar to “Beyond the Wall in Game of Thrones”, Everyone knows it exists but afraid to go to. But learning how to combine these two can give you a massive leverage.
Why is it always R v Python? Why can’t we admit that both are unique in their own way and we should know how to handle….com
ATOM Editor for Python/R
SQL
- How do I learn SQL? What are some good online resources, like websites, blogs, or videos? (You can practice it using the sqlite package in Python)
- SQL Kindle Guide
Step 4. Learn Probability and Statistics
Be sure to go through a course that involves heavy application in R or Python. Knowing probability and statistics will only really be helpful if you can implement what you learn.
- Python Application: Think Stats (free pdf) (Python focus)
- R Applications: An Introduction to Statistical Learning (free pdf)(MOOC) (R focus)
- Print out a copy of Probability Cheatsheet
Step 5. Complete Harvard’s Data Science Course
See the following link to start a course or two.
Learn data science online today. Advance your career as a data scientist with free courses from the world's top…www.edx.org
Step 6. Do all of Kaggle’s Getting Started and Playground Competitions
According to William Chen, A Data Scientist, and a Quantitative Researcher at Two Sigma, “I would NOT recommend doing any of the prize-money competitions. They usually have datasets that are too large, complicated, or annoying, and are not good for learning. The competitions are available at Competitions | Kaggle”
Start by learning scikit-learn, playing around, reading through tutorials and forums on the competitions that you’re doing. Next, play around some more and check out the tutorials for Titanic: Machine Learning from Disaster for a binary classification task (with categorical variables, missing values, etc.)
Afterward, try some multi-class classification with Forest Cover Type Prediction. Now, try a regression task House Prices: Advanced Regression Techniques. Try out some natural language processing with Quora Question Pairs | Kaggle. Finally, try out any of the other knowledge-based competitions that interest you!
Step 7. Learn Some Data Science Electives
Data science is an incredibly large and interdisciplinary field, and different jobs will require different skillsets. Here are some of the more common ones:
- Product Metrics will teach you about what companies track, what metrics they find important, and how companies measure their success: The 27 Metrics in Pinterest’s Internal Growth Dashboard
- Machine Learning How do I learn machine learning? This is an extremely rich area with massive amounts of potential, and likely the “sexiest” area of data science today. Andrew Ng’s Machine Learning course on Coursera is one of the most popular MOOCs, and a great way to start! Andrew Ng’s Machine Learning MOOC
- A/B Testing is incredibly important to help inform product decisions for consumer applications. Learn more about A/B testing here: How do I learn about A/B testing?
- Visualization — I would recommend picking up ggplot2 in R to make simple yet beautiful graphics and just browsing DataIsBeautiful • /r/dataisbeautiful and FlowingData for ideas and inspiration.
- User Behavior — This set of blogs posts looks useful and interesting — This Explains Everything “ User Behavior
- Feature Engineering — Check out What are some best practices in Feature Engineering? and this great example: http://nbviewer.ipython.org/github/aguschin/kaggle/blob/master/forestCoverType_featuresEngineering.ipynb
- Big Data Technologies — These are tools and frameworks developed specifically to deal with massive amounts of data. How do I learn big data technologies?
- Optimization will help you with understanding statistics and machine learning: Convex Optimization — Boyd and Vandenberghe
- Natural Language Processing — This is the practice of turning text data into numerical data whilst still preserving the “meaning”. Learning this will let you analyze new, exciting forms of data. How do I learn Natural Language Processing (NLP)?
- Time Series Analysis — How do I learn about time series analysis?
Step 8. Do a Capstone Product / Side Project
Use your new data science and software engineering skills to build something that will make other people say wow! This can be a website, new way of looking at a dataset, cool visualization, or anything!
- What are some good toy problems (can be done over a weekend by a single coder) in data science? I’m studying machine learning and statistics, and looking for something socially relevant using publicly available datasets/APIs.
- How can I start building a recommendation engine? Where can I find an interesting data set? What tools/technologies/algorithms are best to build the engine with? How do I check the effectiveness of recommendations?
- What are some ideas for a quick weekend Python project? I am looking to gain some experience.
- What is a good measure of the influence of a Twitter user?
- Where can I find large datasets open to the public?
- What are some good algorithms for a prioritized inbox?
- What are some good data science projects?
Create public github repositories, make a blog, and post your work, side projects, Kaggle solutions, insights, and thoughts! This helps you gain visibility, build a portfolio for your resume, and connect with other people working on the same tasks.
Step 9. Get a Data Science Internship or Job
- How do I prepare for a data scientist interview?
- How should I prepare for statistics questions for a data science interview
- What kind of A/B testing questions should I expect in a data scientist interview and how should I prepare for such questions?
- What companies have data science internships for undergraduates?
- What are some tips to choose whether I want to apply for a Data Science or Software Engineering internship?
- When is the best time to apply for data science summer internships?
Check out The Official Quora Data Science FAQ for more discussion on internships, jobs, and data science interview processes! The data science FAQ also links to more specific versions of this question, like How do I become a data scientist without a Ph.D.? or the counterpart, How do I become a data scientist as a Ph.D. student?
Also, you can follow the steps in this article which was written by me.
Never put off for tomorrow what you can do today Jayasekara.Blog
Step 10. Share your Wisdom Back with the Data Science Community
If you’ve made it this far, congratulations on becoming a data scientist! I’d encourage you to share your knowledge and what you’ve learned back with the data science community. Data Science as a nascent field depends on knowledge-sharing!
All these resources are thanks to William Chen and Google.
Thank you!
Keen to know more about me? Check out my website and hit me back if you have any questions regarding my articles.
Here’s an Inspirational Success Story of an Internation Student (That’s me)
Update: Check out Part 2 (How to think like a Data Scientist)
Keen on learning about me and my work? Click here.
Important Reads:
Other than the already mentioned books and links, Here’s some special VIP stuff for those who stayed till the very end. You’re awesome!
- Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
- HBR Guide to Data Analytics Basics for Managers (HBR Guide Series)
- 100 Data Analytics Interview Questions: The Most Important Questions and Answers
- Data Analytics: 5 Books in 1
- Data Science (MIT Press Essential Knowledge series)
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
- Data Science from Scratch: First Principles with Python
12 Comments
Wonderful blog with many helpful pieces of information about Summer Jobs Abroad . kindly visit
ReplyDeleteThank you for sharing an amazing & wonderful blog. This content is very useful, informative and valuable in order to enhance knowledge. Keep sharing this type of content with us & keep updating us with new blogs. Apart from this, if anyone who wants to join the Data Science Training institute in Delhi, can contact 9311002620 or visit our website-
ReplyDeletehttps://www.htsindia.com/Courses/python/python-with-data-science-training-course
Thanks for writing this great article. I’ve been using some of these techniques on by blog. great information about Online Python tutorial if you want to looking best Online Python tutorial so you can visit our website Online Python tutorial
ReplyDeleteThanks for this great article.
ReplyDeleteData Engineering Training
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. you are doing great work and keep it up. Data Analytics Platform
ReplyDeleteThank you for sharing valuable content.
ReplyDeleteData Engineering Training
I'm glad to share this fantastic blog because it really amazed me. I appreciate you sharing. Keep up the good work you're doing. custom erp
ReplyDeleteThis Beginner's Guide to Data Science is your key to unlocking the limitless potential of data science in APTRON, providing you with a comprehensive overview of what to expect and how to get started.
ReplyDeleteIf you're aspiring to become a data scientist, APTRON Noida should be your first choice. Our Data Science Training in Noida program is designed to equip you with the skills and knowledge required to thrive in this data-driven era. Our Data Science Training in APTRON Noida, focusing on what sets us apart from the competition.
ReplyDeleteIts a wonderful post and very informative, thanks for all this information. You are included prodigious content regarding this topic in an effective way.
ReplyDeleteJoe Lemus
To ensure you gain practical experience, our Data Science Course in Noida\ includes real-world projects and case studies. You'll work on industry-relevant projects that prepare you for actual job scenarios. This hands-on approach helps you build a strong portfolio to showcase to potential employers.
ReplyDeleteThanks for sharing this informative post. I really appreciate your efforts and I am waiting for your next post. Thanks once again.
ReplyDeleteSarath Maddineni