Machine Learning
Statistical machine learning is a growing discipline at the intersection of computer science and applied mathematics (probability / statistics, optimization, etc.) and which increasingly plays an important role in many other scientific disciplines.
This course will cover supervised and unsupervised learning, as well as deep learning. The first part of the course on statistical machine learning will be focused on the analysis of data in high dimension, as well as the efficiency of algorithms to process the large amount of data encountered in multiple application areas.
The second part of the course will present the fundamental principles and methods of recent deep learning techniques and their links with theoretical physics.
Homework and oral final project.
- Proficiency in Python: please use the tutorial here for those who aren't as familiar with Python
- Basic Calculus, Linear Algebra
- Basic Probability and Statistics
1. Fundamentals of predictions and supervised learning
Fundamentals of predictions
- Minimizing errors
- Modeling knowledge
- Prediction via optimization
- Types of errors and successes
- The Neyman-Pearson Lemma
- Properties of ROC curves
supervised learning
- Sample versus Population
- A first learning algorithm: the perceptron
- Connection to empirical risk minmization
- Formal guarantees for the perceptron
2. Unsupervised learning: k-means and EM
- K-means clustering
- Mixtures of Gaussian
- Expectation-Maximization for GMM
3. Kernels
- Local averaging methods
- partitions estimators
- k-nearest neighbors
- kernel smoothing
- Positive-definite kernel methods
- representer theorem
- kernel trick
4. Bayesian and Variational Inference
- Gaussian
- Linear regression
- Logistic regression
- Laplace method
- Variational inference
5. Optimization for machine learning
- gradient descent
- SGD
- over-parameterized models
6. Pytorch basics and autodiff
7. Autoencoders and Normalizing Flows
8. Diffusion models ddpm
Resources: