After 6 months studying we have ended up following a slightly different plan from where we started. Check out the original plan here.
Below is a summary of the two terms we completed, colour-coded according to whether the course was:
- Green: planned and completed
- Red: planned and not completed
- Blue: unplanned and completed
Term 1
Description | Why we did / did not study this course | |
CS106B | Programming abstractions in C++. | Algorithms and Programming are essential data science skills. |
NLP | Machine learning applied to the language problem. | NLP is an important application of machine learning and probabilistic programming. |
Statistics | Overview of statistics. | We thought this course would provide a good overview of statistical inference. |
Databases | Databases and database languages. | Accessing and querying databases are still essential skills in data analysis. |
Visualisation | Interactive browser visualisations. | Visualising data and using it to tell a story allows data science work to have an impact. |
Stats Work | Practise problems to supplement Udacity Statistics course. | After Udacity Statistics course, we wanted a more theoretical understanding of statistics and therefore opted for more statistics in term 2. |
Spanish | Spanish language course. | Very quickly, it became obvious we wouldn’t have the time to dedicate to foreign languages. |
Term 2
Description | Why we did / did not study this course | |
Data Science | Broad skillset for data science work. | Having already studied an excellent course from Harvard before, we were excited that they were offering a single course covering what we were interested in. |
Stats110 | Statistics starting from probability theory. | Statistics underpins data science and a solid understanding to build upon is important. |
Statistics Inference | Statistical Inference | Having completed Stat110, we wanted to go on to Stat111 to learn about inference in more detail; unfortunately it is not offered online and therefore completed the last relevant 5 chapters from the MIT course. |
Final Project | A number of final projects. | We carried out a number of projects: designing a website showcasing this journey and analysing our study data and through two Kaggle competitions. |
CS169 | Engineering Software as a Service. | Unfortunately we ran out of time. This course would be suitable for software development and at this stage we chose to focus on data science. |
Ruby | Ruby programming language. | Unfortunately there just wasn’t enough time to start another language. |
RoR | Ruby on Rails – web framework | Unfortunately there just wasn’t enough time to do this at this stage. |
API and Web Dev | API and Web Dev | These concepts were actually covered and practised in Data Science CS109 and Visualisation CS171. |
Some things we learned about planning an open source data science masters:
1. Harder than we thought
- We did not manage to complete all the courses we initially set out to finish.
- Studying ended up taking over weekends and evenings, relaxing into a pattern of intense study followed by short holidays (kind of like University studying). Our take away from this, is confirmation that we really could not have completed this while working in full time jobs and needed to dedicate ourselves fully to our studies.
2. Statistics foundation
After completing the Udacity statistics course, we thought we would be prepared to apply statistics as data scientists. However this was not the case and although the course was a good starting point, it did not cover the mathematical foundations of statistics.
Our solution was therefore to study Stats 110 from Harvard. This was a brilliant course covering statistics from its very start within probability theory. We will probably never regret the time we took to understand statistics and probability properly and were we to start again, we would begin studying this course before moving on to look at Data Analysis, Machine Learning, etc …
3. Harvard has some superb courses (available as of Nov 2014!)
The revisions to our original course plan were largely due to our experience of studying Harvard’s CS171 visualisation course online. We were blown away by the quality of teaching in this course and the resources provided; therefore we prioritised completing Stats 110 and CS109 Data Science from Harvard, incase the resources were taken down from the web. We would advise anyone browsing for courses online to keep Harvard’s courses in mind alongside Coursera and the other MOOC providers.
In fact I have caught the Harvard bug so badly that my next step is probably to study their Algorithms course.
4. To understand topics fully takes time and a lot of effort
Finally something that most people know, but is very easy to forget in our modern world of ‘6 minute abs’ and ‘4 hour work weeks’, learning something properly takes time and effort or ‘beating on your craft’ as Will Smith once said. The courses that we learnt the most from, were the ones which were painful to do sometimes, that you had to rewind the video on and work late into the evening. It has made me realise that when you are looking at what courses to study online, if the course doesn’t take much work or covers a lot quickly, the trade off will be in how deeply you understand the subject. I think online learning is great and is already changing the world; but not all courses are designed for the same thing, some are overviews and some are comprehensive. Make sure you pick the ones which fit your learning goals.
Hey, that was great, I’m doing a similar thing, except that I didn’t quit my current job and didn’t move to Thailand 🙂 , but learning data science is my ultimate goal. I’ve been doing the Coursera Data Science Specialization, watching the “Probabilistic Systems Analysis and Applied Probability” on MIT and following the “statistics” path on khan academy (the late two mainly to get a better understanding on linear regression – I also blog about it at http://dmenin.wordpress.com/) .
so let me ask, you guys seem to have learned a lot, what’s your next steps now?
Hi Diego, thanks for checking out the blog ! It’s nice to hear from people that are doing similar things. I guess the question of what next is one many are asking and one that we are trying to figure out for ourselves. Watch this space 🙂 I think Christmas holidays might come first though..
I have become a data scientist using only online free resources. I am impressed by what you achieved is 6 month even if you already have a degree in physics (I guess that helps).
Following is a list of resources yo can use to keep on your learning experience (you already know some of them)
https://skim.it/u/ThomasV/data-science-learning-path
Thank you for sharing your journey with us. I am very inspired by your commitment. I have recently become interested in data science and just started learning R. I am Thai and live in Thailand. I am happy to know that there are quite a number of people out there doing the same thing. Since you are quite ahead of me, I am sure I can learn much from your experience and will continue to follow your blog. Thanks again!
Hi Fras and Sabine,
Thanks for sharing a planned course curriculum, It will be really helpful for me as I am undergoing the Johnhopkins DSS track in Coursera. Meanwhile, Please let me know the different ways in getting a job as a Junior data scientist other than working in Kaggle competitions.
Pingback: Where to Start? | A Noob's Journey To Data Science
Great material, seriously, as I’m trying to become a data scientist too I felt really connected to what you’ve done, I’m in a slightly different path since I have a tutor at my University who is one of the best data scientists in Brazil, thanks god he advices me. Though, I only a know a bit of it (ROC curve, Binary Problems, Decision Tree, Logistic Regression, Transformation, Knowledge Discovery and all things you have to go through.) and my experience so far has been great, I hope yours become even better! Mail me if you ever come to Brazil , I’d love to see some insights from all of you.