Yuanheng Zhu

State Key Laboratory for Management and Control of Complex Systems

Institute of Automation, Chinese Academy of Sciences

Beijing 100190, China

Phone: +86-130-0118-1922; Fax: +86-10-82544799


Research Areas

Multi-agent reinforcement learning

Deep reinforcement learning

Sequential games

Cooperation and competition

Swarm intelligence


09/2010--07/2015, Instititue of Automation, Chinese Acadey of Sciences , PhD

09/2006--07/2010, Nanjing University, B.S.


07/2015--now, Institute of Automation, Chinese Academy of Sciences, Assistant Research, Associated Researcher

12/2017--12/2018, University of Rhode Island, Visiting Scholar

Teaching Experience

2018/2019, University of Chinese Academy of Sciences, Reinforcement Learning (with Prof Dongbin Zhao)

2019/2020, 2020/2021, University of Chinese Academy of Sciences, Reinforcement Learning (with Profs Dongbin Zhao and Qichao Zhang)



[1] Synthesis of Cooperative Adaptive Cruise Control with Feedforward Strategies, IEEE Transactions on Vehicular Technology, 2020-02, First Author.
[2] Vision-based control in the open racing car simulator with deep and reinforcement learning, Journal of Ambient Intelligence and Humanized Computing, 2019-09, First Author.
[3] LMI-Based Synthesis of String-Stable Controller for Cooperative Adaptive Cruise Control, IEEE Transactions on Intelligent Transportation Systems, 2019-08, First Author.
[4] Control-limited adaptive dynamic programming for multi-battery energy storage systems, IEEE Transactions on Smart Grid, 2019-07, First Author.
[5] Adaptive optimal control of heterogeneous CACC system with uncertain dynamics, IEEE Transactions on Control Systems Technology, 2019-07, First Author.

[6] Invariant Adaptive Dynamic Programming for Discrete-Time Optimal Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019-04, First Author.

[7] StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning, IEEE Transactions on Emerging Topics in Computational Intelligence, 2019-02, Second Author.

[8] Comprehensive comparison of online ADP algorithms for continuous-time optimal control, Artificial Intelligence Review, 2018-04, First Author.
[9] 深度强化学习进展: 从 AlphaGo 到 AlphaGo Zero, Recent progress of deep reinforcement learning: from AlphaGo to AlphaGo Zero, 控制理论与应用, 2017-12, Forth Author.
[10] Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems, IET Control Theory & Applications, 2017-09, Forth Author.
[11] Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming, IEEE Transactions on Industrial Electronics, 2017-05, First Author.
[12] Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data, IEEE Transactions on Neural Networks and Learning Systems, 2017-03, First Author.
[13] Policy iteration for Hinfty optimal control of polynomial nonlinear systems via sum of squares programming, IEEE transactions on cybernetics, 2017-02, First Author.
[14] Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs, Neurocomputing, 2017-02, Third Author.
[15] Probably approximately correct reinforcement leaming solving continuous-state control problem, 控制理论与应用, 2016-12, First Author.
[16] Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics, IET Control Theory Applications, 2016-07, First Author.
[17] Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems, Cognitive Computation, 2015-06, First Author.
[18] A data-based online reinforcement learning algorithm satisfying probably approximately correct principle, Neural Computing and Applications, 2015-04, First Author.
[19] MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems, IEEE Transactions on Neural Networks and Learning Systems, 2015-02, Second Author.
[20] Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems, Neurocomputing, 2015-02, First Author.

[1] 基于加速度前馈的异构车队协同自适应巡航控制方法, 发明, 2020, 第 1 作者, 专利号: 201911110197.3
[2] 多电池储能系统的优化控制方法、系统及存储介质, 发明, 2020, 第 1 作者, 专利号: 201810967603.7
[3] 智能驾驶车道保持方法及系统, 发明, 2018, 第 5 作者, 专利号: 201811260601.0
[4] 弹簧质量阻尼器的鲁棒跟踪控制方法, 发明, 2018, 第 3 作者, 专利号: 201810004181.3
[5] 基于数据的Q函数自适应动态规划方法, 发明, 2013, 第 2 作者, 专利号: 201310036976.X
[6] 储能电池充放电异常行为检测方法及检测系统, 发明, 2016, 第 3 作者, 专利号: 201610687158.X
[7] 基于反事实回报的多智能体深度强化学习方法、系统, 发明, 2020, 第 3 作者, 专利号: 201911343902.4