Suppose life is a multi-armed bandit (general stochastic, to be precise). That is a *solved problem*: the theoretical optimum is to pull whichever arm worked best most recently. Agree or disagree?
what is the strategy exactly? reading it literally, it sounds like it would lead to pull one arm, and then repeat the same forever i assume there is also some explore component to it
In a multiple iteration game, you’d want to mix in some randomness, wouldn’t you? Some variation on an epsilon-greedy strategy seems to work well.