2025 INFORMS Data Mining Society Data Challenge
1. Problem
Extreme weather events-such as thunderstorms, high winds, and heat waves-are among the leading causes of large-scale power outages in the United States. In recent years, these events have grown in both frequency and severity, placing increasing strain on electric utilities and emergency responders. Accurate short-term outage forecasting is essential for improving grid resilience, enabling proactive crew deployment, optimizing resource allocation, and ultimately minimizing societal and economic impacts.
Despite its importance, short-term outage prediction remains a challenging problem. Outage patterns are highly irregular, often featuring long periods of no events interspersed with sudden spikes during severe weather. The challenge is compounded by the high-frequency nature of the data, which captures rapid changes in both outages and weather but also increases noise and variability. Predictive performance is further limited by the scarcity of publicly available outage datasets paired with such fine-grained, high-resolution weather data.
To address this challenge, we have curated a high-resolution, hourly dataset of county-level outage counts and 108 weather features for Michigan (April–July 2023). This period features strong weather–outage correlation, making it ideal for benchmarking predictive models.
The goal of this competition is to develop models that predict short-term outages during extreme weather events, using historical outage and weather features, under realistic deployment constraints where future weather data during the test period will not be available.
2. Timeline
The entire competition will run until September 26th, 2025. During the training phase contestants should use the provided training data to develop their models. The testing phase will be conducted on a holdout dataset where only the competition committee knows the true values. The testing period will be Sept. 19 – Sept. 26, and the holdout dataset will be released prior to September 11th. The leaderboard will be updated twice on the Data Challenge webpage at https://sites.google.com/view/dmdaworkshop2025/data-challenge. Saturday September 27th will be the final ranking update. We will invite the top four competitors to the INFORMS Data Mining and Decision Analytics Workshop to present their solutions based on the final ranking.
3. Data and Evaluation
The dataset contains hourly county-level outage counts and weather variables for all 83 counties in Michigan (FIPS codes 26001–26165). The training period is from 2023-04-01 00:00 to 2023-06-30 00:00 (data source: poweroutage.us).
The dataset contains 109 weather variables, covering temperature, humidity, wind, precipitation, severe weather, and other atmospheric and land surface conditions. These include:
- Temperature & Humidity – near-surface temperature and moisture conditions: t2m (2m temperature), d2m (2m dewpoint), sh2 (specific humidity), r2 (relative humidity), t, r, r_1
- Pressure & Geopotential Heights – atmospheric pressure and heights at multiple levels: pres, pres_1, pres_2, sp (surface pressure), pt (potential temperature), gh, gh_1 … gh_7 (geopotential heights at various levels)
- Wind & Turbulence – horizontal and vertical wind components, gusts, and turbulence measures: u, v, u10, v10, gust, max_10si, vucsh, vvcsh, ustm, vstm, wz, wz_1
- Severe Weather & Instability – parameters linked to storms, convection, and hail: cape, cape_1 (convective available potential energy), cin (convective inhibition), lftx, lftx4 (lifted indices), hail, hail_1, hail_2, ltng (lightning)
- Clouds & Radiation – cloud coverage and surface radiation fluxes: tcc, tcc_1 (total cloud cover), hcc (high cloud cover), mcc (medium cloud cover), lcc (low cloud cover), sdswrf (surface downward shortwave radiation), sdlwrf (surface downward longwave radiation), suswrf (surface upward shortwave), sulwrf (surface upward longwave)
- Precipitation & Hydrology – rain, snow, and related surface hydrology: prate (precipitation rate), tp (total precipitation), crain (convective rain), cfrzr (freezing rain), cicep (ice pellets), csnow (snow), cpofp (probability of frozen precipitation), bgrun (baseflow runoff), ssrun (surface runoff)
- Other Environmental & Land Surface Variables – additional atmospheric, oceanic, and land surface parameters: mslma (mean sea-level pressure anomaly), pwat (precipitable water), refc (composite reflectivity), refd, refd_1 (radar reflectivity at specific levels), aod (aerosol optical depth), veg (vegetation fraction), lai (leaf area index), vgtyp (vegetation type), orog (orography), vis (visibility), blh (boundary layer height), fsr (forecast surface roughness), gflux (ground heat flux), as well as various technical and diagnostic fields including veril, tcolw (total column water), tcoli (total column ice), plpl (pressure lapse), mstav, sdwe, sdwe_1, and layth.
Participants will build models to forecast outage counts at two horizons:
1. 24-hour horizon – day-ahead forecast starting June 29, 2023 at 01:00, ending on July 1, 2023 at 00:00.
2. 48-hour horizon – two-day forecast starting June 29, 2023 at 01:00, ending on July 2, 2023 at 00:00.
The final ranking will be based on the average rank across the two horizons using Root Mean Squared Error (RMSE) as the metric.
To help participants get started, we have provided a sample demonstration using a simple seq2seq model for outage forecasting. This example shows how to load the provided NetCDF dataset, fit a model, and generate predictions. We have also included evaluation code for the 24-hour and 48-hour forecast horizons. Please note the test_24h_demo.nc and test_48h_demo.nc in the data files are sample datasets only, not the actual test sets.
4. Submission
- Each team needs to submit their predictions on the test data through a Google form. As we get closer to time, we will provide the testing set as well as a template for predictions.
- We will release the ranking results of the results on September 20th and September 27th. The final ranking is computed based on the result released on September 27th and will be used to determine the finalists.
- One member in each team needs to be identified as the primary contact and provide their email in the submission form.
5. Written Report
All teams invited as finalists are required to submit a 6-page maximum report summarizing their methodologies, due on October 15th, 2025 (anywhere on earth). Winners will be chosen based on the presentations and the reports by a panel of judges.
6. Prize
The finalists will be chosen directly based on numerical results using the average ranking across the four response variables. Each finalist team will have one complimentary registration code for the workshop but will still need to pay for the main INFORMS conference registration (if attending). If you are invited as one of the four finalists, you will receive a monetary prize. The final selection of winners will be based on the quality of the presentation and the written methodology, judged by our panel of judges.
- First Place Prize: 1000 USD
- Second Place Prize: 500 USD
- Third Place Prize: 250 USD
- Fourth Place Prize: 125 USD
7. Registration
Please use the following link to register for the competition HERE. After registration, you will be emailed a link, to access the training data. We will notify each registered member to access the testing data on September 11, 2025. You can also check the Data Challenge Webpage for updated data and leaderboard information.
Competition Chairs
Shuyi Chen
Ph.D. Student
Heinz College of Information Systems and Public Policy
Carnegie Mellon University
shuyic@alumni.cmu.edu
Feng Qiu
Group Leader, Advanced Grid Modeling - Optimization and Analytics
Heinz College of Information Systems and Public Policy
Argonne National Laboratory
fqiu@anl.gov
Hieu Pham
Assistant Professor
Department of Information Systems, Supply Chain, and Analytics, College of Business
The University of Alabama in Huntsville
hieu.pham@uah.edu
Shouyi Wang
Associate Professor
Department of Industrial, Manufacturing & Systems Engineering, College of Engineering
The University of Texas at Arlington
shouyiw@uta.edu
Shixiang (Woody) Zhu
Assistant Professor
Heinz College of Information Systems and Public Policy
Carnegie Mellon University
shixianz@andrew.cmu.edu
------------------------------
Shouyi Wang, Ph.D.
Associate Professor
Center on Stochastic Modeling, Optimization, and Statistics
Department of Industrial, Manufacturing & Systems Engineering
The University of Texas at Arlington
Email:
shouyiw@uta.eduWeb:
https://mentis.uta.edu/explore/profile/shouyi-wang------------------------------