Machine learning helps Stanford physicists predict dangerous solar flares earlier

January 14, 2015

This solar flare was captured Jan. 14 by NASA’s Solar Dynamics Observatory. Stanford physicists are using artificial intelligence techniques in an attempt to predict such flares. (Credit: NASA/SDO and the AIA; EVE; and HMI science teams)

Using artificial intelligence techniques to forecast solar flares*, Stanford solar physicists have automated the analysis of the largest-ever set of solar observations, using data from the Solar Dynamics Observatory (SDO).

Solar physicists identify which features are most useful for predicting solar flares, which requires processing more data — some 1.5 terabytes a day — than any other satellite in NASA history, according to solar physicists Monica Bobra and Sebastien Couvidat.

Their study, using an instrument aboard SDO, the Helioseismic Magnetic Imager (HMI), collects vector magnetic fields and other observations of the entire surface of the sun almost continuously. The Stanford Solar Observatories Group, headed by physics Professor Phil Scherrer, processes and stores the SDO data.

Machine learning for earlier solar-flare warnings

The physicists decided to the use this data to predict the strength of solar flares, such as M or X, using machine language. (M-class flares can cause minor radiation storms that might endanger astronauts and cause brief radio blackouts at Earth’s poles. X-class flares are the most powerful.)

To do that, the researchers first catalogued flaring and non-flaring regions from a database of more than 2,000 active regions and then characterized those regions by 25 features such as energy, current and field gradient. They then fed the machine-learning system 70 percent of the data, to train it to identify relevant features. And then they used the system to analyze the remaining 30 percent of the data to test its accuracy in predicting solar flares.

Machine learning confirmed that the topology of the magnetic field and the energy stored in the magnetic field are very relevant to predicting solar flares. Using just a few of the 25 features, machine learning discriminated between active regions that would flare and those that would not flare. Although others have used different methods to come up with similar results, machine learning provides a significant improvement because automated analysis is faster and could provide earlier warnings of solar flares.

However, this study only used information from the solar surface. That would be like trying to predict Earth’s weather from only surface measurements like temperature, without considering the wind and cloud cover. The next step in solar flare prediction would be to incorporate data from the sun’s atmosphere, Bobra said.

*Solar flares can release the energy equivalent of many atomic bombs, enough to cut out satellite communications and damage power grids on Earth, 93 million miles away. The flares arise from twisted magnetic fields that occur all over the sun’s surface, and they increase in frequency every 11 years, a cycle that is now at its maximum.


Abstract for Solar flare prediction using SDO/HMI vector magnetic field data with a machine-learning algorithm

We attempt to forecast M- and X-class solar flares using a machine-learning algorithm, called support vector machine (SVM), and four years of data from theĀ Solar Dynamics Observatory‘s Helioseismic and Magnetic Imager, the first instrument to continuously map the full-disk photospheric vector magnetic field from space. Most flare forecasting efforts described in the literature use either line-of-sight magnetograms or a relatively small number of ground-based vector magnetograms. This is the first time a large data set of vector magnetograms has been used to forecast solar flares. We build a catalog of flaring and non-flaring active regions sampled from a database of 2071 active regions, comprised of 1.5 million active region patches of vector magnetic field data, and characterize each active region by 25 parameters. We then train and test the machine-learning algorithm and we estimate its performances using forecast verification metrics with an emphasis on the true skill statistic (TSS). We obtain relatively high TSS scores and overall predictive abilities. We surmise that this is partly due to fine-tuning the SVM for this purpose and also to an advantageous set of features that can only be calculated from vector magnetic field data. We also apply a feature selection algorithm to determine which of our 25 features are useful for discriminating between flaring and non-flaring active regions and conclude that only a handful are needed for good predictive abilities.