Advanced
Lucas Baker@alpha
10/28/2022

Suppose life is a multi-armed bandit (general stochastic, to be precise). That is a *solved problem*: the theoretical optimum is to pull whichever arm worked best most recently. Agree or disagree?

In reply to @alpha
Varun Srinivasan@v
10/28/2022

what is the strategy exactly? reading it literally, it sounds like it would lead to pull one arm, and then repeat the same forever i assume there is also some explore component to it

In reply to @alpha
10/28/2022

In a multiple iteration game, you’d want to mix in some randomness, wouldn’t you? Some variation on an epsilon-greedy strategy seems to work well.