Abstract:With the development of intelligent power system construction, power data shows a massive and multi dimensions trends. The bad data in power system reduces the accuracy of the estimation results in the state of the power system, computational resources of the traditional clustering algorithms dealing with massive high dimensional data with single machine are insufficient, and the MapReduce, more popular in recent years, cannot effectively deal with frequent iteration calculation problem. According to the above, this paper puts forward a new method of identifying bad data with parallel K-means algorithm based on Spark. To a certain node load data as the research object, the parallel K-means clustering algorithm based on Spark is used to extract daily load characteristic curve, to detect and identify bad data in state estimation of power transmission network respectively. Experiments are conducted with the data of the real load provided by EUNITE, the results show that this method can effectively improve the accuracy of state estimation, and compared with the method based on the MapReduce, it has better speed-up ratio, scalability, and can better process massive data in power system.