After 6 months studying we have ended up following a slightly different plan from where we started. Check out the original plan here.
Below is a summary of the two terms we completed, colour-coded according to whether the course was:
- Green: planned and completed
- Red: planned and not completed
- Blue: unplanned and completed
|Description||Why we did / did not study this course|
|CS106B||Programming abstractions in C++.||Algorithms and Programming are essential data science skills.|
|NLP||Machine learning applied to the language problem.||NLP is an important application of machine learning and probabilistic programming.|
|Statistics||Overview of statistics.||We thought this course would provide a good overview of statistical inference.|
|Databases||Databases and database languages.||Accessing and querying databases are still essential skills in data analysis.|
|Visualisation||Interactive browser visualisations.||Visualising data and using it to tell a story allows data science work to have an impact.|
|Stats Work||Practise problems to supplement Udacity Statistics course.||After Udacity Statistics course, we wanted a more theoretical understanding of statistics and therefore opted for more statistics in term 2.|
|Spanish||Spanish language course.||Very quickly, it became obvious we wouldn’t have the time to dedicate to foreign languages.|
|Description||Why we did / did not study this course|
|Data Science||Broad skillset for data science work.||Having already studied an excellent course from Harvard before, we were excited that they were offering a single course covering what we were interested in.|
|Stats110||Statistics starting from probability theory.||Statistics underpins data science and a solid understanding to build upon is important.|
|Statistics Inference||Statistical Inference||Having completed Stat110, we wanted to go on to Stat111 to learn about inference in more detail; unfortunately it is not offered online and therefore completed the last relevant 5 chapters from the MIT course.|
|Final Project||A number of final projects.||We carried out a number of projects: designing a website showcasing this journey and analysing our study data and through two Kaggle competitions.|
|CS169||Engineering Software as a Service.||Unfortunately we ran out of time. This course would be suitable for software development and at this stage we chose to focus on data science.|
|Ruby||Ruby programming language.||Unfortunately there just wasn’t enough time to start another language.|
|RoR||Ruby on Rails – web framework||Unfortunately there just wasn’t enough time to do this at this stage.|
|API and Web Dev||API and Web Dev||These concepts were actually covered and practised in Data Science CS109 and Visualisation CS171.|
Some things we learned about planning an open source data science masters:
1. Harder than we thought
- We did not manage to complete all the courses we initially set out to finish.
- Studying ended up taking over weekends and evenings, relaxing into a pattern of intense study followed by short holidays (kind of like University studying). Our take away from this, is confirmation that we really could not have completed this while working in full time jobs and needed to dedicate ourselves fully to our studies.
2. Statistics foundation
After completing the Udacity statistics course, we thought we would be prepared to apply statistics as data scientists. However this was not the case and although the course was a good starting point, it did not cover the mathematical foundations of statistics.
Our solution was therefore to study Stats 110 from Harvard. This was a brilliant course covering statistics from its very start within probability theory. We will probably never regret the time we took to understand statistics and probability properly and were we to start again, we would begin studying this course before moving on to look at Data Analysis, Machine Learning, etc …
3. Harvard has some superb courses (available as of Nov 2014!)
The revisions to our original course plan were largely due to our experience of studying Harvard’s CS171 visualisation course online. We were blown away by the quality of teaching in this course and the resources provided; therefore we prioritised completing Stats 110 and CS109 Data Science from Harvard, incase the resources were taken down from the web. We would advise anyone browsing for courses online to keep Harvard’s courses in mind alongside Coursera and the other MOOC providers.
In fact I have caught the Harvard bug so badly that my next step is probably to study their Algorithms course.
4. To understand topics fully takes time and a lot of effort
Finally something that most people know, but is very easy to forget in our modern world of ‘6 minute abs’ and ‘4 hour work weeks’, learning something properly takes time and effort or ‘beating on your craft’ as Will Smith once said. The courses that we learnt the most from, were the ones which were painful to do sometimes, that you had to rewind the video on and work late into the evening. It has made me realise that when you are looking at what courses to study online, if the course doesn’t take much work or covers a lot quickly, the trade off will be in how deeply you understand the subject. I think online learning is great and is already changing the world; but not all courses are designed for the same thing, some are overviews and some are comprehensive. Make sure you pick the ones which fit your learning goals.