Course Schedule

Weekday Regular Schedule

Group Type Hours Location
01 Lecture Monday 13-16 Orentstein 111

Course Plan

The course roughly consists of three parts:

  1. online convex optimization
  2. adversarial and stochastic multi-arm bandits
  3. advanced topics.

A tentative list of topics (will not cover all, and might cover some other topics):

  • Intro: online learning
  • Online convex optimization:
    • Follow the Leader, Follow the regularized leader,
    • Gradient Descent, Mirror Dscent
    • Convex duality
  • Online classificatrion
  • Partial Information (Multi-Arm Bandits)
  • Stochastic MAB: UCB, Explore/Expoit, Best Arm identification
  • Lower bounds: adversarial and stochastic (intro to Info Theory)
  • Regret and Equilibrium (zero-sum games, correlated equilibrium)
  • Special types of MAB:
    • Contextual
    • linear
    • Markovian (Gittins Index)
    • switching cost
  • MAB and pricing
  • Portfolio section

Detailed Schedule

Week Date Lecture topics references Lecture Scribe
1 Oct 15 Introduction, online learning model, regret minimization basics Section 1 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
Section 4.3 from:
Learning, Regret minimization, and Equilibria, A. Blum and Y. Mansour
scribe 1
2 Oct 22 Online Convex Optimization: intro, Follow the Leader, Follow the Regularized Leader Sections 2.1, 2.2., 2.3 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
scribe 2
3 Oct 29 Online Gradient Descent and strong convex regularizers Sections 2.4, 2.5 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
scribe 3
4 Nov 5 Fenchel duality and online mirror descent Sections 2.6, 2.7 and 2.8 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
Also scribe notes here and here
scribe 4
5 Nov 12 Online classification (Perceptron and Winnow), online-to-batch Sections 3.3 and 5 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
scribe5
6 Nov 19 Multi-Arm Bandits Section 4 from:
Online Learning and Online Convex Optimization. Shai Shalev-Shwartz
Also,
Online convex optimization in the bandit setting: gradient descent without a gradient.
Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan
scribe6
7 Nov 26 Stochastic MAB (Explore-Then-Exploit, Successive Action Elimination, Upper Confidence Bound (UCB))
Best Arm Identification (Median Algorithm)
Chapter 2:
Introduction to Multi-Armed Bandits Aleksandrs Slivkins
Also,
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
Eyal Even-Dar, Shie Mannor and Yishay Mansour, Journal of Machine Learning Research (2006).
scribe7
8 Dec 3 Lower bounds Chapter 3:
Introduction to Multi-Armed Bandits Aleksandrs Slivkins
Also see here
scribe8
9 Dec 10 Applications of regret minimization for zero sum games and algorithm design. Notes on regret and zero sum games
Algorithm design and Regret
scribe9
10 Dec 17 Swap regret and its applications (correlated equilibrium, dominated actions and calibration) From External to Internal Regret Avrim Blum and Yishay Mansour
Also sections 4.4 and 4.5 from:
Learning, Regret minimization, and Equilibria Avrim Blum and Yishay Mansour
Also class notes Section 5.8.3 and Section 8.3 and 8.4
scribe10
11 Dec 24 Swap regret and wide range regret From External to Internal Regret Avrim Blum and Yishay Mansour (section 3 and 6)
Minimizing Wide Range Regret with Time Selection Functions Subhash Khot and Ashok Kumar Ponnuswami
scribe11
12 Dec 31 Contextual bandits and Lipschitz bandits Parts of chapters 5,7,9:
Introduction to Multi-Armed Bandits Aleksandrs Slivkins
also class notes on EXP4 by Brendan McMahan here and contextual bandits by Alekh Agarwal here
scribe12
13 Jan 7 Gittins Index Lecture notes here, based on:
J. Gittins, K. Glazerbrook and R. Weber, Multi-Arm Bandit Allocation Indices
John N. Tsitsiklis, A short proof of the Gittins index theorem
scribe13
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License