Reinforcement learning from comparisons: Three alternatives are enough, two are not
Benoit Laslier
(1, 2)
,
Jean-François Laslier
(3, 4)
Jean-François Laslier
- Fonction : Auteur
- PersonId : 10499
- IdHAL : jean-francois-laslier
- ORCID : 0000-0001-8334-1350
- IdRef : 069975124
Résumé
This paper deals with two generalizations of the Polya urn model where, instead of sampling one ball from the urn at each time, we sample two or three balls. The processes are defined on the basis of the problem of finding the best alternative using pairwise comparisons which are not necessarily transitive: they can be thought of as evolutionary processes that tend to reinforce currently efficient alternatives. The two processes exhibit different behaviors: with three balls sampled, we prove almost sure convergence towards the unique optimal solution of the comparisons problem while, in some cases, the process with two balls sampled has almost surely no limit. This is an example of a natural reinforcement model with no exchangeability whose asymptotic behavior can be precisely characterized.
Domaines
Economies et financesFormat du dépôt | Notice |
---|---|
Type de dépôt | Article dans une revue |
Titre |
en
Reinforcement learning from comparisons: Three alternatives are enough, two are not
|
Résumé |
en
This paper deals with two generalizations of the Polya urn model where, instead of sampling one ball from the urn at each time, we sample two or three balls. The processes are defined on the basis of the problem of finding the best alternative using pairwise comparisons which are not necessarily transitive: they can be thought of as evolutionary processes that tend to reinforce currently efficient alternatives. The two processes exhibit different behaviors: with three balls sampled, we prove almost sure convergence towards the unique optimal solution of the comparisons problem while, in some cases, the process with two balls sampled has almost surely no limit. This is an example of a natural reinforcement model with no exchangeability whose asymptotic behavior can be precisely characterized.
|
Auteur(s) |
Benoit Laslier
1, 2
, Jean-François Laslier
3, 4
1
ICJ -
Institut Camille Jordan
( 193738 )
- Bât. Jean Braconnier 43 bd du 11 novembre 1918 69622 VILLEURBANNE CEDEX
- France
2
PSPM -
Probabilités, statistique, physique mathématique
( 521755 )
- France
3
PSE -
Paris School of Economics
( 301309 )
- 48 boulevard Jourdan 75014 Paris
- France
4
PJSE -
Paris Jourdan Sciences Economiques
( 578027 )
- 48 boulevard Jourdan 75014 Paris
- France
|
Langue du document |
Anglais
|
Nom de la revue |
|
Vulgarisation |
Non
|
Comité de lecture |
Oui
|
Audience |
Internationale
|
Date de publication |
2017
|
Volume |
27
|
Numéro |
5
|
Page/Identifiant |
2907-2925
|
Domaine(s) |
|
DOI | 10.1214/16-AAP1271 |
Loading...