Course Schedule
Weekday Regular Schedule
Group | Type | Hours | Location |
---|---|---|---|
01 | Lecture | Monday 13-16 | Orentstein 111 |
Course Plan
The course roughly consists of three parts:
- online convex optimization
- adversarial and stochastic multi-arm bandits
- advanced topics.
A tentative list of topics (will not cover all, and might cover some other topics):
- Intro: online learning
- Online convex optimization:
- Follow the Leader, Follow the regularized leader,
- Gradient Descent, Mirror Dscent
- Convex duality
- Online classificatrion
- Partial Information (Multi-Arm Bandits)
- Stochastic MAB: UCB, Explore/Expoit, Best Arm identification
- Lower bounds: adversarial and stochastic (intro to Info Theory)
- Regret and Equilibrium (zero-sum games, correlated equilibrium)
- Special types of MAB:
- Contextual
- linear
- Markovian (Gittins Index)
- switching cost
- MAB and pricing
- Portfolio section
Detailed Schedule
Week | Date | Lecture topics | references | Lecture Scribe |
1 | Oct 15 | Introduction, online learning model, regret minimization basics | Section 1 from: Online Learning and Online Convex Optimization. Shai Shalev-Shwartz Section 4.3 from: Learning, Regret minimization, and Equilibria, A. Blum and Y. Mansour |
scribe 1 |
2 | Oct 22 | Online Convex Optimization: intro, Follow the Leader, Follow the Regularized Leader | Sections 2.1, 2.2., 2.3 from: Online Learning and Online Convex Optimization. Shai Shalev-Shwartz |
scribe 2 |
3 | Oct 29 | Online Gradient Descent and strong convex regularizers | Sections 2.4, 2.5 from: Online Learning and Online Convex Optimization. Shai Shalev-Shwartz |
scribe 3 |
4 | Nov 5 | Fenchel duality and online mirror descent | Sections 2.6, 2.7 and 2.8 from: Online Learning and Online Convex Optimization. Shai Shalev-Shwartz Also scribe notes here and here |
scribe 4 |
5 | Nov 12 | Online classification (Perceptron and Winnow), online-to-batch | Sections 3.3 and 5 from: Online Learning and Online Convex Optimization. Shai Shalev-Shwartz |
scribe5 |
6 | Nov 19 | Multi-Arm Bandits | Section 4 from: Online Learning and Online Convex Optimization. Shai Shalev-Shwartz Also, Online convex optimization in the bandit setting: gradient descent without a gradient. Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan |
scribe6 |
7 | Nov 26 | Stochastic MAB (Explore-Then-Exploit, Successive Action Elimination, Upper Confidence Bound (UCB)) Best Arm Identification (Median Algorithm) |
Chapter 2: Introduction to Multi-Armed Bandits Aleksandrs Slivkins Also, Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems Eyal Even-Dar, Shie Mannor and Yishay Mansour, Journal of Machine Learning Research (2006). |
scribe7 |
8 | Dec 3 | Lower bounds | Chapter 3: Introduction to Multi-Armed Bandits Aleksandrs Slivkins Also see here |
scribe8 |
9 | Dec 10 | Applications of regret minimization for zero sum games and algorithm design. | Notes on regret and zero sum games Algorithm design and Regret |
scribe9 |
10 | Dec 17 | Swap regret and its applications (correlated equilibrium, dominated actions and calibration) | From External to Internal Regret Avrim Blum and Yishay Mansour Also sections 4.4 and 4.5 from: Learning, Regret minimization, and Equilibria Avrim Blum and Yishay Mansour Also class notes Section 5.8.3 and Section 8.3 and 8.4 |
scribe10 |
11 | Dec 24 | Swap regret and wide range regret | From External to Internal Regret Avrim Blum and Yishay Mansour (section 3 and 6) Minimizing Wide Range Regret with Time Selection Functions Subhash Khot and Ashok Kumar Ponnuswami |
scribe11 |
12 | Dec 31 | Contextual bandits and Lipschitz bandits | Parts of chapters 5,7,9: Introduction to Multi-Armed Bandits Aleksandrs Slivkins also class notes on EXP4 by Brendan McMahan here and contextual bandits by Alekh Agarwal here |
scribe12 |
13 | Jan 7 | Gittins Index | Lecture notes here, based on: J. Gittins, K. Glazerbrook and R. Weber, Multi-Arm Bandit Allocation Indices John N. Tsitsiklis, A short proof of the Gittins index theorem |
scribe13 |