In Yue [Yue, J. C. (1999). Generalized two-stage bandit problem. Commun. Statist. Theo. Meth. 28(9):2261-2276], a two-stage approach was used to explore the Bernoulli two-armed bandit problem, where he assumed that one arm has a smaller prior variance than the other arm. In this paper, adapting Yue's assumption, we study the structure of the optimal strategy, which maximizes the expected number of successes. We confirm the conjecture of Pearson [Pearson, L. M. (1980). Treatment Allocation for Clinical Trials in Stages. Ph.D. thesis, University of Minnesota] that it is never optimal to allocate an equal number of observations to two identical arms in the first stage.
Communications in statistics: Theory and methods , 33(7), 1577-1585