TY - JOUR
T1 - Theoretical learning goal selection for non-communicative multi-agent cooperation
AU - Uwano, Fumito
AU - Takadama, Keiki
N1 - Publisher Copyright:
© 2020 The Institute of Electrical Engineers of Japan.
PY - 2020
Y1 - 2020
N2 - This paper extended PMRL as the non-communicative and theoretical method for two agents, and proposed PLA as the method to be able to force agents to learn cooperative behavior for any number of agents. In addition, this paper adds the theoretic explanation for PLA that all agents achieve all purposes without spending the largest times. Concretely PLA forces each agent to avoid the more difficult purposes requiring many time to be reached by limiting the purpose which it can achieve, and it forces the agents to learn cooperative policy as achieving the appropriate purpose among the limited purposes. The experimental results in this paper derive that (1) PLA enables the agents to learn cooperative policy in the two grid world problems for three and five agents, and (2) PLA can force all agents to achieve all purposes in the problems with the minimum time.
AB - This paper extended PMRL as the non-communicative and theoretical method for two agents, and proposed PLA as the method to be able to force agents to learn cooperative behavior for any number of agents. In addition, this paper adds the theoretic explanation for PLA that all agents achieve all purposes without spending the largest times. Concretely PLA forces each agent to avoid the more difficult purposes requiring many time to be reached by limiting the purpose which it can achieve, and it forces the agents to learn cooperative policy as achieving the appropriate purpose among the limited purposes. The experimental results in this paper derive that (1) PLA enables the agents to learn cooperative policy in the two grid world problems for three and five agents, and (2) PLA can force all agents to achieve all purposes in the problems with the minimum time.
KW - Multi-agent system
KW - Reinforcement learning
KW - Reward management
UR - http://www.scopus.com/inward/record.url?scp=85077517064&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077517064&partnerID=8YFLogxK
U2 - 10.1541/ieejeiss.140.75
DO - 10.1541/ieejeiss.140.75
M3 - Article
AN - SCOPUS:85077517064
SN - 0385-4221
VL - 140
SP - 75
EP - 84
JO - IEEJ Transactions on Electronics, Information and Systems
JF - IEEJ Transactions on Electronics, Information and Systems
IS - 1
ER -