All posts tagged #machine-learning all posts RSS-feed

Efficient overfitting of training data (Kaggle Bowl 2017)

During the Kaggle Data Science Bowl 2017, the leaderboard was based on only $198$ samples. The opportunity for overfitting was quickly understood, but initially only the naive option was mentioned, testing 1 submission per sample taking 66 days (still doable within the competition duration, but less than ideal). But then Oleg Trott got a perfect score in just 14 submissions! topic) I was really curious how he managed to do this. Together with Cas, I found out one way it... full post»

Linear model vs decision tree (in R)

Linear models vs decision trees I’ve used the R statistical language a bit, a long time ago. It was my first real encounter with data science, but future encounters used Matlab and Python. But lately I’ve been picking up R again, as it’s popular in the data science community. As practise / demo, I thought I’d do a simple exploration of the strengths and weaknesses of linear models versus decision trees. This was inspired by Claudia Perlich at kdnuggets. Let’s... full post»