NSF CAREER: Developing Spatiotemporal Relational Models to
Anticipate Tornado Formation
Start date: July 1 2008
Abstract and Project Goals
The goal of this research is to revolutionize the ability to anticipate tornadoes by developing advanced techniques for statistical pattern discovery in spatially and temporally varying relational data. These models are applied to complete fields of meteorological quantities obtained through data assimilation and simulation. Doppler radar data is limited and, while modern data assimilation techniques allow the unobserved quantities to be estimated, the resulting four-dimensional fields are too complicated for the extraction of meaningful, repeatable patterns by either humans or current data mining techniques. By studying a full field of variables, the models can identify critical interactions among high level features. The models are developed and verified in close collaboration with domain experts.
The interdisciplinary research is used to improve retention and recruitment in computer science (CS). This draws on recent evidence that underrepresented groups are not drawn to computing careers because they do not appreciate how computing can be used to solve real world problems. Introducing authentic projects into both early CS and meteorology classes will improve the number of technically trained students in both majors.
The primary broader impact of this research is to society, through the potential for reduction in loss of human life, property, and money. Models will be made available to operational meteorologists as they are verified. Another broader impact will come from increasing the number of computing oriented minors and majors through authentic projects. All data and results will be disseminated through peer reviewed publications and via open source online repositories.
Students
Collaborators
Research Challenges
- The research challenges for this project span both computer science and meteorology.
- One of the primary CS goals for this project is to automatically create human readable networks of dependencies in large dynamic relational data sets, specifically focusing on severe storms. Creating a machine learning technique to accomplish this in near real-time poses a number of challenges. The primary challenge comes from the exponential increase in model complexity introduced by temporal data and temporal dependencies. Search techniques must handle this efficiently or the models will not be able to be learned. To address this issue, we will draw on our success with Relational U-Tree (Dabney & McGovern, 2007) where we used stochastic sampling as introduced by Srinivasan (1999) in conjunction with temporal sampling to address temporal autocorrelation issues. A second challenge comes from the fact that, unless one could sense the state of every molecule in the atmosphere, weather data will be partially observable. Finally, learning a human readable model of dynamic data introduces a challenge in both representing the dynamic relational data and in representing the model itself. The representations need to be simple enough to be easily understood by scientists outside computer science (CS) yet rich enough to not overly bias the model.
- In meteorology, the key challenges (and contributions) come from our novel use of a complete field of meteorological variables. As an integral part of applying the proposed model to tornado anticipation, we will be developing automated techniques to identify and extract high-level storm features such as regions of potentially dangerous winds, large hail, or torrential rain. Identifying and tracking such features in a near real-time scale is critical to successful model learning yet presents difficulties than include object permanence and data overload.
Current/Final Results
-
Our latest results are under blind review at a conference and will be posted as soon as the review process completes. Older results can be found below in presentations.
Publications
-
McGovern, Amy and Hiers, Nathan and Collier, Matthew and Gagne II, David J. and Brown, Rodger A. (2008). Spatiotemporal Relational Probability Trees. To appear in the 2008 IEEE International Conference on Data Mining. [pdf (326K)]
-
Collier, Matthew and McGovern, Amy (2008). Kernels for the Investigation of Localized Spatiotemporal Transitions of Drought with Support Vector Machines. To appear in the International Workshop on Spatial and Spatiotemporal Data Mining (SSTDM-08) to be held at the International Conference on Data Mining (ICDM 2008). [pdf (362K)]
Presentations
The following presentations highlight our preliminary results leading up to the CAREER award.
-
Hiers, Nathan; McGovern, Amy; Rosendahl, Derek H.; Brown, Rodger A; Droegemeier, Kelvin K. (2008). Using Spatiotemporal Relational Data Mining to Identify the Key Parameters for Anticipating Rotation Initiation in Simulated Supercell Thunderstorms. Preprints of the Sixth Conference on Artificial Intelligence and its Applications to the Environmental Sciences, joint session with the 24th Conference on International Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology.
- McGovern, Amy, and Rosendahl, Derek H., and Kruger, Adrianna, and Beaton, Meredith G., and Brown, Rodger A., and Droegemeier, Kelvin K. (2007) Understanding the formation of tornadoes through data mining. Preprints of the Fifth Conference on Artificial Intelligence and its Applications to Environmental Sciences at the American Meteorological Society annual conference.
Images/Videos
- From the 2008 AMS meeting: Three dimensional view of a storm taken every 30 seconds for the storm's lifetime. The red regions show strong updrafts, the blue shows strong downdrafts, and the green and yellow show strong vorticity. This storm was generated using ARPS.
- From the 2008 AMS meeting: Two dimensional movie of storm region and tracking algorithm. The left panel shows each storm region (highlighted using colored regions) and the right panel shows the reflectivity of the storm at 4km. These storms were generated using ARPS. This algorithm is used to create our relational data as well.
- From the 2007 AMS meeting: Two dimensional movie of storm tracking and splitting. The left panel shows the reflectivity at 4km and the right panel shows the storm regions. Each distinct color is a distinct storm.
Data
-
The full set of simulated storms used in the 2007 and 2008 AMS presentations is available by request. Note that the total size is nearly 6TB.
-
Due to a publication under review, the meta-data is currently not online. As soon as the paper is published, the meta-data will be made available here.
Broader Impacts
- Broader impacts will be listed and highlighted as the project continues. The classroom work will begin in the fall of 2008. REU students begin in August of 2008.
Highlights, Press, Awards, Demos, ...
Coming soon!
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. IIS .0746816. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Point of contact: Amy McGovern. Last updated
July 22, 2008 4:54 PM