Cooperative Output Regulation By Q-learning For Discrete Multi-agent Systems In Finite-time

Wenjun Wei; Jingyuan Tang

doi:10.6180/jase.202306_26(6).0011

Cooperative Output Regulation By Q-learning For Discrete Multi-agent Systems In Finite-time

Electrical Engineering

Three-dimensional coordinate diagram of multi-agent states at finite time

Wenjun Wei^{1, 2} and Jingyuan Tang ¹

¹School of Automation & Electrical Engineering, Lanzhou Jiao tong University, An Ning Road, Lanzhou 730070, Gansu. China
²The key Laboratory of Opto-Technology and Intelligent Control Ministry of Education, Lanzhou Jiao tong University, An Ning Road, Lanzhou 730070, Gansu. China

Received: March 27, 2022
Accepted: June 28, 2022
Publication Date: September 20, 2022

Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||https://doi.org/10.6180/jase.202306_26(6).0011

ABSTRACT

This article studies the output regulation of discrete-time multi-agent systems with an unknown model by a finite-time optimal control algorithm based on Q-learning that uses the method of the linear quadratic regulator (LQR). The algorithm uses the Bellman optimality principle to deduce the Q-function under global optimality. It obtains the distributed optimal control law that minimizes the value of Q-function by policy iteration. Through local communication among agents, the optimal global control of each agent’s output can be realized without relying on the dynamic model of the system. Secondly, by designing a novel finite-time local error formula, the output regulation synchronization time is reduced by 50%. Finally, a MATLAB simulation example shows the capability of the nominated algorithm.

Keywords: Discrete multi-agent systems, Q-learning, cooperative output regulation, fast convergence

REFERENCES

[1] Y. Cai, H. Zhang, Y. Liang, and Z. Gao, (2020) “Reduced-order observer-based robust leader-following control of heterogeneous discrete-time multi-agent systems with system uncertainties" Applied Intelligence 50(6): 1794–1812. DOI: 10.1007/s10489-019-01553-x.
[2] K. Patel and A. Mehta. Discrete-Time Sliding Mode Protocols for Discrete Multi-Agent System. 303. Springer, 2021.
[3] X.-K. Wang, X. Li, and Z.-Q. Zheng, (2013) “Survey of developments on multi-agent formation control related problems" Kongzhi yu Juece/Control and Decision 28(11): 1601–1613.
[4] Y. Xia, X. Na, Z. Sun, and J. Chen, (2016) “Formation control and collision avoidance for multi-agent systems based on position estimation" ISA Transactions 61: 287–296. DOI: 10.1016/j.isatra.2015.12.010.
[5] S. Li, J. Zhang, X. Li, F. Wang, X. Luo, and X. Guan, (2017) “Formation Control of Heterogeneous Discrete-Time Nonlinear Multi-Agent Systems with Uncertainties" IEEE Transactions on Industrial Electronics 64(6): 4730–4740. DOI: 10.1109/TIE.2017.2674590.
[6] Y.-G. Hong and C. Zhai, (2011) “Dynamic coordination and distributed control design of multi-agent systems" Kongzhi Lilun Yu Yingyong/Control Theory and Applications 28(10): 1506–1512.
[7] W. Yu, W. Ren, W. X. Zheng, G. Chen, and J. Lü, (2013) “Distributed control gains design for consensus in multi-agent systems with second-order nonlinear dynamics" Automatica 49(7): 2107–2115. DOI: 10.1016/j.automatica.2013.03.005.
[8] L. Liu, (2015) “Adaptive cooperative output regulation for a class of nonlinear multi-agent systems" IEEE Transactions on Automatic Control 60(6): 1677–1682. DOI: 10.1109/TAC.2014.2360023.
[9] P. Shi and Q. Shen, (2015) “Cooperative Control of Multi-Agent Systems with Unknown State-Dependent Controlling Effects" IEEE Transactions on Automation Science and Engineering 12(3): 827–834. DOI: 10.1109/TASE.2015.2403261.
[10] M. S. Mahmoud and G. D. Khan, (2018) “LMI consensus condition for discrete-time multi-agent systems" IEEE/CAA Journal of Automatica Sinica 5(2): 509–513. DOI: 10.1109/JAS.2016.7510016.
[11] X. Zhang, H. Ma, and C. Zhang, (2021) “Decentralised adaptive synchronisation of a class of discrete-time and nonlinearly parametrised coupled multi-agent systems" International Journal of Control 94(2): 461–475. DOI: 10.1080/00207179.2019.1598577.
[12] Q.Wei, X.Wang, X. Zhong, and N.Wu, (2021) “Consensus Control of Leader-Following Multi-Agent Systems in Directed Topology with Heterogeneous Disturbances" IEEE/CAA Journal of Automatica Sinica 8(2): 423–431. DOI: 10.1109/JAS.2021.1003838.
[13] H. Zhao and S. Fei, (2018) “Distributed consensus for discrete-time heterogeneous multi-agent systems" International Journal of Control 91(6): 1376–1384. DOI: 10.1080/00207179.2017.1315650.
[14] X.Wang, Y. Hong, J. Huang, and Z.-P. Jiang, (2010) “A distributed control approach to a robust output regulation problem for multi-agent linear systems" IEEE Transactions on Automatic Control 55(12): 2891–2895. DOI: 10.1109/TAC.2010.2076250.
[15] H. Kim, H. Shim, and J. H. Seo, (2011) “Output consensus of heterogeneous uncertain linear multi-agent systems" IEEE Transactions on Automatic Control 56(1): 200–206. DOI: 10.1109/TAC.2010.2088710.
[16] N. Li, H. Ma, C. Du, X. Zhang, and X. Liu, (2021) “Distributed adaptive containment control for a class of discrete-time nonlinear multi-agent systems with uncertainties" International Journal of Control 94(8): 2186–2199. DOI: 10.1080/00207179.2019.1695950.
[17] B. Kiumarsi and F. L. Lewis, (2017) “Output synchronization of heterogeneous discrete-time systems: A modelfree optimal approach" Automatica 84: 86–94. DOI: 10.1016/j.automatica.2017.07.004.
[18] C. Mu, Q. Zhao, and C. Sun, (2020) “Optimal Model-Free Output Synchronization of Heterogeneous Multiagent Systems under Switching Topologies" IEEE Transactions on Industrial Electronics 67(12): 10951–10964. DOI: 10.1109/TIE.2019.2958277.
[19] T. Feng, J. Zhang, Y. Tong, and H. Zhang, (2021) “Qlearning algorithm in solving consensusability problem of discrete-time multi-agent systems" Automatica 128: DOI: 10.1016/j.automatica.2021.109576.
[20] Y. Cao and W. Ren, (2012) “Distributed coordinated tracking with reduced interaction via a variable structure approach" IEEE Transactions on Automatic Control 57(1): 33–48. DOI: 10.1109/TAC.2011.2146830.
[21] C. Yuan and H. He, (2018) “Cooperative output regulation of heterogeneous multi-agent systems with a leader of bounded inputs" IET Control Theory and Applications 12(2): 233–242. DOI: 10.1049/iet-cta.2017.0641.
[22] Y.-K. Zhu, X.-P. Guan, and X.-Y. Luo, (2013) “Finitetime consensus of heterogeneous multi-agent systems" Chinese Physics B 22(3): DOI: 10.1088/1674-1056/22/3/038901.
[23] S. Yang, J.-X. Xu, D. Huang, and Y. Tan, (2014) “Optimal iterative learning control design for multi-agent systems consensus tracking" Systems and Control Letters 69(1): 80–89. DOI: 10.1016/j.sysconle.2014.04.009.
[24] X. Li, X. Luo, J. Wang, and X. Guan, (2018) “Finitetime consensus of nonlinear multi-agent system with prescribed performance" Nonlinear Dynamics 91(4): 2397–2409. DOI: 10.1007/s11071-017-4020-1.
[25] C. Mu, C. Sun, A. Song, and H. Yu, (2016) “Iterative GDHP-based approximate optimal tracking control for a class of discrete-time nonlinear systems" Neurocomputing 214: 775–784. DOI: 10.1016/j.neucom.2016.06.059.
[26] C. Mu, D. Wang, and H. He, (2017) “Novel iterative neural dynamic programming for data-based approximate optimal control design" Automatica 81: 240–252. DOI: 10.1016/j.automatica.2017.03.022.
[27] Z. Peng, Y. Zhao, J. Hu, and B. K. Ghosh, (2019) “Datadriven
optimal tracking control of discrete-time multiagent systems with two-stage policy iteration algorithm" Information Sciences 481: 189–202. DOI: 10.1016/j.ins.2018.12.079.
[28] B. Jang, M. Kim, G. Harerimana, and J.W. Kim, (2019) “Q-Learning Algorithms: A Comprehensive Classification and Applications" IEEE Access 7: 133653–133667. DOI: 10.1109/ACCESS.2019.2941229.
[29] Z. Xiao, J. Li, and P. Li, (2020) “Output Feedback H Control for Linear Discrete-Time Multi-Player Systems with Multi-Source Disturbances Using Off-Policy QLearning" IEEE Access 8: 208938–208951. DOI: 10.1109/ACCESS.2020.3038674.
[30] M. Li, L. Sun, and R. Yang, (2018) “Finite-time H control for a class of discrete-time nonlinear singular systems" Journal of the Franklin Institute 355(13): 5384–5393. DOI: 10.1016/j.jfranklin.2018.05.033.
[31] G. Zong, R. Wang, W. Zheng, and L. Hou, (2015) “Finite-time H control for discrete-time switched nonlinear systems with time delay" International Journal of Robust and Nonlinear Control 25(6): 914–936. DOI: 10.1002/rnc.3121.
[32] L. Zhou, J. She, and S. Zhou, (2018) “Robust H control of an observer-based repetitive-control system" Journal of the Franklin Institute 355(12): 4952–4969. DOI: 10.1016/j.jfranklin.2018.05.024.
[33] L. Li and F. Liao, (2018) “Robust preview control for a class of uncertain discrete-time systems with time-varying delay" ISA Transactions 73: 11–21. DOI: 10.1016/j.isatra.2018.01.005.
[34] Y. Su and J. Huang, (2012) “Cooperative output regulation of linear multi-agent systems" IEEE Transactions on Automatic Control 57(4): 1062–1066. DOI: 10.1109/TAC.2011.2169618.
[35] H. Zhang, H. Jiang, Y. Luo, and G. Xiao, (2017) “Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems with Unknown Dynamics Using Reinforcement Learning Method" IEEE Transactions on Industrial Electronics 64(5): 4091–4100. DOI: 10.1109/TIE.2016.2542134.
[36] X. Yang, H. Zhang, and Z. Wang, (2021) “Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning" IEEE Transactions on Neural Networks and Learning Systems: DOI: 10.1109/TNNLS.2021.3054685.
[37] M. I. Abouheaf, F. L. Lewis, M. S. Mahmoud, and D. G. Mikulski, (2015) “Discrete-time dynamic graphical games: model-free reinforcement learning solution" Control Theory and Technology 13(1): 55–69.