融合先验知识和多阶段QMIX强化学习的综合能源系统优化调度
CSTR:
作者:
作者单位:

1.杭州电子科技大学自动化学院,浙江 杭州 310000;2.全省分布式新能源并网与 消纳技术研究重点实验室,浙江 杭州 310000

作者简介:

通讯作者:

中图分类号:

基金项目:

浙江省自然科学基金项目资助(LY24F030010)


Optimal scheduling of integrated energy systems based on prior knowledge and multi-stage QMIX reinforcement learning
Author:
Affiliation:

1. School of Automation, Hangzhou Dianzi University, Hangzhou 310000, China; 2. Zhejiang Key Laboratory of Distributed New Energy Grid Connection and Consumption Technology Research, Hangzhou 310000, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    综合能源系统(integrated energy system, IES)的多能耦合特性与拓扑结构复杂化趋势,使其优化调度成为平衡经济性与安全性的关键挑战。针对传统多智能体强化学习维度灾难引发的收敛困难及探索机制缺陷导致的局部最优问题,提出了一种先验知识引导的多阶段QMIX架构实时优化调度方法。首先,将IES实时优化调度描述为分布式部分可观测马尔可夫决策过程,构建基于联合动作价值函数更新策略的QMIX框架。然后,按机组能源耦合关联度集群划分,设计多阶段QMIX训练策略以缓解维度灾难。最后,引入融合先验知识的动作探索增强机制引导收敛轨迹。在多种负荷场景(40天样本日)中进行了调度仿真。结果表明,所提方法在收敛性能上优势显著,且有效降低了系统运行成本。

    Abstract:

    The multi-energy coupling characteristics and increasingly complex topology of integrated energy systems (IES) make optimal scheduling a pivotal challenge in balancing economy efficiency and operational security. To address the issues of convergence difficulty caused by the curse of dimensionality in traditional multi-agent reinforcement learning, as well as local optima resulting from insufficient exploration mechanisms, a real-time optimal scheduling method based on a prior knowledge-guided multi-stage QMIX architecture is proposed. First, the IES real-time optimal scheduling is formulated as a distributed partially observable Markov decision process, and a QMIX framework based on joint action value function updates is constructed. Then, according to the coupling relationships among energy units, a clustering-based multi-stage QMIX training strategy is designed to alleviate the curse of dimensionality. Finally, an enhanced action exploration mechanism incorporating prior knowledge is developed to guide the convergence trajectory. Scheduling simulations are conducted under multiple load scenarios (40 sample days). The results show that the proposed method exhibits significant advantages in convergence performance and effectively reduces the overall system operation costs.

    参考文献
    相似文献
    引证文献
引用本文

楼 劲,汪梦雨,郑凌蔚.融合先验知识和多阶段QMIX强化学习的综合能源系统优化调度[J].电力系统保护与控制,2026,54(07):13-23.[LOU Jing, WANG Mengyu, ZHENG Lingwei. Optimal scheduling of integrated energy systems based on prior knowledge and multi-stage QMIX reinforcement learning[J]. Power System Protection and Control,2026,V54(07):13-23]

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-09-07
  • 最后修改日期:2025-12-30
  • 录用日期:
  • 在线发布日期: 2026-03-27
  • 出版日期:
文章二维码
关闭
关闭