Abstract:The multi load short-term forecast is very important for the optimal scheduling and economic operation of an integrated energy system. There are strong coupling relationships among the multi loads, and the model structure of the transformer is completely based on the self-attention mechanism. This can better analyze the internal relationship between multi loads. It is difficult to directly apply the traditional transformer in multi load forecasting because it is designed for natural language processing problems. For this reason, this paper proposes a GRU-Talkinghead-Gated residuals-Transformer (GRU-TGTransformer) model, which uses gated recurrent units, instead of the original word embedding and position coding links, to extract the input features of input data and obtain high-dimensional feature data with relative position information. By introducing a communication mechanism in the multi-head self-attention, the self-attention expression effect is improved. A gate unit is introduced into the residual connection to improve the stability of the model in time series prediction. This paper uses the integrated energy system of Arizona State University's Tempe campus as an example to prove that the proposed model has higher prediction accuracy than the traditional model.