Theoretical learning goal selection for non-communicative multi-agent cooperation

Fumito Uwano, Keiki Takadama

研究成果査読

抄録

This paper extended PMRL as the non-communicative and theoretical method for two agents, and proposed PLA as the method to be able to force agents to learn cooperative behavior for any number of agents. In addition, this paper adds the theoretic explanation for PLA that all agents achieve all purposes without spending the largest times. Concretely PLA forces each agent to avoid the more difficult purposes requiring many time to be reached by limiting the purpose which it can achieve, and it forces the agents to learn cooperative policy as achieving the appropriate purpose among the limited purposes. The experimental results in this paper derive that (1) PLA enables the agents to learn cooperative policy in the two grid world problems for three and five agents, and (2) PLA can force all agents to achieve all purposes in the problems with the minimum time.

本文言語English
ページ(範囲)75-84
ページ数10
ジャーナルIEEJ Transactions on Electronics, Information and Systems
140
1
DOI
出版ステータスPublished - 2020
外部発表はい

ASJC Scopus subject areas

  • 電子工学および電気工学

引用スタイル