A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li et al., Taming the monster: A fast and simple algorithm for contextual bandits, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp.1638-1646, 2014.

O. Armantier, Does observation influence learning?, Games and Economic Behavior, vol.46, issue.2, pp.221-239, 2004.
DOI : 10.1016/S0899-8256(03)00124-6

K. J. Arrow and J. R. Green, Notes on Expectations Equilibria in Bayesian Settings, Working Paper Institute for Mathematical Studies in the Social Sciences, vol.33, 1973.

P. Auer, Using Confidence Bounds for Exploitation-Exploration Trade-offs, Journal of Machine Learning Research, vol.3, pp.397-422, 2002.

A. Bandura and F. J. Mcdonald, Influence of social reinforcement and the behavior of models in shaping children's moral judgment., The Journal of Abnormal and Social Psychology, vol.67, issue.3, pp.274-281, 1963.
DOI : 10.1037/h0044714

A. Bandura, D. Ross, and S. A. Ross, Vicarious reinforcement and imitative learning., The Journal of Abnormal and Social Psychology, vol.67, issue.6, pp.601-607, 1963.
DOI : 10.1037/h0045550

J. Banks, M. Olson, and D. Porter, An experimental analysis of the bandit problem, Economic Theory, vol.10, issue.1, pp.55-77, 1997.
DOI : 10.1007/s001990050146

R. Bayer and H. Wu, Do we learn from our own experience or from observing others, 2016.

A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire, Contextual bandit algorithms with supervised learning guarantees, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AIS-TATS), 2011.

B. Bossan, O. Jann, and P. Hammerstein, The evolution of social learning and its economic consequences, Journal of Economic Behavior & Organization, vol.112, pp.266-288, 2015.
DOI : 10.1016/j.jebo.2015.01.010

J. Bouchaud, Crises and Collective Socio-Economic Phenomena: Simple Models and Challenges, Journal of Statistical Physics, vol.56, issue.5, pp.567-606, 2013.
DOI : 10.1093/cesifo/ifq017

URL : http://arxiv.org/pdf/1209.0453

M. Bray, Learning, estimation, and the stability of rational expectations, Journal of Economic Theory, vol.26, issue.2, pp.318-339, 1982.
DOI : 10.1016/0022-0531(82)90007-2

S. Brown, M. Steyvers, and E. Wagenmakers, Observing evidence accumulation during multi-alternative decisions, Journal of Mathematical Psychology, vol.53, issue.6, pp.453-462, 2009.
DOI : 10.1016/j.jmp.2009.09.002

C. J. Burke, P. N. Tobler, M. Baddeley, and W. Schultz, Neural mechanisms of observation learning, Proceedings of the National Academy of Science, p.1443114436, 2010.

R. R. Bush and F. Mosteller, A mathematical model for simple learning., Psychological Review, vol.58, issue.5, pp.313-323, 1951.
DOI : 10.1037/h0054388

C. Camerer and T. Ho, Experience-weighted Attraction Learning in Normal Form Games, Econometrica, vol.67, issue.4, pp.827-874, 1999.
DOI : 10.1111/1468-0262.00054

URL : http://www.hss.caltech.edu/SSPapers/wp1003.pdf

J. D. Cohen, S. M. Mcclure, and A. J. Yu, Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.46, issue.4, pp.933-942, 2007.
DOI : 10.1037/0033-295X.111.4.939

M. Dufwenberg, R. Sundaram, and D. J. Butler, Epiphany in the Game of 21, Journal of Economic Behavior & Organization, vol.75, issue.2, pp.132-143, 2010.
DOI : 10.1016/j.jebo.2010.03.025

C. Efferson, P. J. Richerson, R. Mcelreath, M. Lubell, E. Edsten et al., Learning, productivity, and noise: an experimental study of cultural transmission on the Bolivian Altiplano, Evolution and Human Behavior, vol.28, issue.1, pp.11-17, 2007.
DOI : 10.1016/j.evolhumbehav.2006.05.005

I. Erev and A. E. Roth, Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria, American Economic Review, vol.88, pp.848-881, 1998.

U. Fischbacher, z-Tree: Zurich toolbox for ready-made economic experiments, Experimental Economics, vol.90, issue.2, pp.171-178, 2007.
DOI : 10.1007/s10683-006-9159-4

URL : https://link.springer.com/content/pdf/10.1007%2Fs10683-006-9159-4.pdf

R. Fryer and P. Harms, Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability, 2017.
DOI : 10.1287/moor.2017.0863

URL : http://arxiv.org/pdf/1506.07291

M. J. Fryling, C. Johnston, and L. J. Hayes, Understanding Observational Learning: An Interbehavioral Approach The Analysis of Verbal Behavior, pp.191-203, 2011.

A. Garivier and E. Kaufmann, Optimal Best Arm Identification with Fixed Confidence, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01273838

B. Greiner, Subject pool recruitment procedures: organizing experiments with ORSEE, Journal of the Economic Science Association, vol.94, issue.4, pp.114-125, 2015.
DOI : 10.1080/00221309.1976.9711593

R. J. Herrnstein, Experiments on stable suboptimality in individual behavior, American Economic Review Papers and Proceedings, vol.81, pp.360-364, 1991.

T. T. Hills, P. M. Todd, D. Lazer, A. D. Redish, and I. D. , Exploration versus exploitation in space, mind, and society, Trends in Cognitive Sciences, vol.19, issue.1, pp.46-54, 2015.
DOI : 10.1016/j.tics.2014.10.004

C. A. Holt and S. K. Laury, Risk Aversion and Incentive Effects, American Economic Review, vol.92, issue.5, pp.1644-1655, 2002.
DOI : 10.1257/000282802762024700

URL : http://people.virginia.edu/~cah2k/highpay.pdf

Y. Hu, Y. Kayaba, and M. Shum, Nonparametric learning rules from bandit experiments: The eyes have it!, Games and Economic Behavior, vol.81, pp.215-231, 2013.
DOI : 10.1016/j.geb.2013.05.003

URL : http://www.econ.jhu.edu/pdf/papers/wp560.pdf

A. Kirman, Learning by Firms about Demand Conditions, Adaptive Economic Models, pp.137-156, 1975.
DOI : 10.1016/B978-0-12-207350-2.50008-7

P. Laird and R. Saul, Discrete sequence prediction and its applications, Machine Learning, pp.43-68, 1994.
DOI : 10.1007/BF01000408

URL : http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19960023221_1996047986.pdf

D. Laureiro-martínez, S. Brusoni, N. Canessa, and M. Zollo, Understanding the exploration-exploitation dilemma: An fMRI study of attention control and decision-making performance, Strategic Management Journal, vol.13, issue.3, pp.319-338, 2015.
DOI : 10.1287/orsc.13.3.339.2780

D. Laureiro-martínez, N. Canessa, S. Brusoni, M. Zollo, T. Hare et al., Frontopolar cortex and decision-making efficiency: comparing brain activity of experts with different professional background during an exploration-exploitation task, Frontiers in Human Neuroscience, vol.7, pp.1-10, 2014.
DOI : 10.3389/fnhum.2013.00927

D. Marchiori and M. Warglien, Predicting Human Interactive Learning by Regret-Driven Neural Networks, Science, vol.1, issue.1, pp.1111-1113, 2008.
DOI : 10.1023/A:1009957816843

URL : https://iris.unive.it/bitstream/10278/29818/1/MWScience.pdf

B. Efferson and . Paciotti, Applying evolutionary models to the laboratory study of social learning, Evolution and Human Behavior, vol.26, 2005.

C. Mckinney and J. Van-huyck, Eureka Learning: Heuristics and response time in perfect information games, Games and Economic Behavior, vol.79, pp.223-232, 2013.
DOI : 10.1016/j.geb.2013.02.003

H. B. Mcmahan and M. J. Streeter, Tighter Bounds for Multi-Armed Bandits with Expert Advice, COLT 2009 -The 22nd Conference on Learning Theory, 2009.

J. Nadal, O. Chenevez, G. Weisbuch, and A. Kirman, A Formal Approach to Market Organization: Choice Functions, Mean Field Approximation and Maximum Entropy Principle, Advances in Self-Organization and Evolutionary Economics, pp.149-159, 1998.

A. Nedic, D. Tomlin, P. Holmes, D. A. Prentice, and J. D. Cohen, A Decision Task in a Social Context: Human Experiments, Models, and Analyses of Behavioral Data Interaction Dynamics: The Interface of Humans and Smart Machines, Proceedings of the IEEE, pp.713-733, 2012.

L. Smith and P. N. Sørensen, Observational learning, The New Palgrave Dictionary of Economics, 2011.
DOI : 10.1057/9780230226203.3870

D. Sonsino, Learning to Learn, Pattern Recognition, and Nash Equilibrium, Games and Economic Behavior, vol.18, issue.2, pp.286-331, 1997.
DOI : 10.1006/game.1997.0532

L. Spiliopoulos, Pattern recognition and subjective belief learning in a repeated constant-sum game, Games and Economic Behavior, vol.75, issue.2, pp.921-935, 2012.
DOI : 10.1016/j.geb.2012.01.005

M. Steyvers, M. D. Lee, and E. Wagenmakers, A Bayesian analysis of human decision-making on bandit problems, Journal of Mathematical Psychology, vol.53, issue.3, pp.168-179, 2009.
DOI : 10.1016/j.jmp.2008.11.002

M. Woodford, Learning to Believe in Sunspots, Econometrica, vol.58, issue.2, pp.277-307, 1990.
DOI : 10.2307/2938205