Python Machine Learning
📅 Finished on: 2020-09-13
💻 IT
⭐⭐⭐⭐⭐
Pure gold: a dense read with a lot of formulas and code, but very interesting material with a 360-degree view of ML fundamentals. I feel well prepared now and will revisit it in the future!
The foundation I will return to for the most useful data and formulas if I continue with Machine Learning. The only drawback is the heavy math, which is demanding even though I know it is useful, and the ending, where Deep Learning feels a bit tacked on. Definitely a strong read. Not exactly entertaining, but I raced through some chapters and I feel I have gained a lot of confidence.
I list the chapters to remember where and what to look for.
- Intro, explains supervised, reinforcement (trial & error, like chess), unsupervised
- Perceptron ‘80 single-layer, discusses activation and gradient
- Gold: classifiers with scikit-learn. Logistic Regression, SVM, KNN with Iris. Introduces kernels
- Another gem: preprocessing explained well. Missing data, LabelEncoder, Scaler + L1 regularization (and later L2) to simplify and reduce overfitting
- Dimensionality reduction to cut computational load and overfitting: PCA, LDA (best here), and Kernel PCA
- Gold: hyperparameter optimization. K-Fold, Grid Search Cross Validation. Pipelines are used
- Bagging and AdaBoost for combining different models. I did not focus much on this section
- Basic NLP: bag-of-words on the IMDb dataset
- Challenging: embedding the model from 7 in a webpage with SQLite. Very interesting
- Linear regression (already familiar), with Housing. Focus on continuous variables, time series. Introduces R2, polynomial regression, and random tree/forest
- Gold: clustering. Uses K-means (with RASCAL), hierarchical, and DBSCAN for unusual shapes. Remember silhouette and elbow as techniques to check results when you cannot plot
- Deep Learning. Starts from the perceptron in 2 and goes deep into backpropagation and related topics
- Theano & Keras. A bit outdated given TensorFlow, but interesting. I was a bit tired so I should review it; in practice they speed up deep learning with GPU and CPU plus tools