SCS Technical Talk

  • Gates&Hillman Centers
  • Reddy Conference Room 4405
  • Manager
  • Scalable Modeling and Analysis Systems Department
  • Sandia National Laboratories, Livermore

Fault-Tolerant Computing at Exascale - A Quiet Revolution in Progress

As we advance toward exascale-class (HPC) machines, there is growing realization that major challenges and changes lay ahead for application developers. Some of the key assumptions central to the current HPC programming paradigm will not hold at this scale. Resilience issues and ‘portable performance’ are expected to require paradigm shifts in the way we develop and run our applications. New programming models as well as machine and application abstractions are emerging to address these challenges. These new models and approaches may collectively represent a revolution in the way we develop, deploy, and run applications on our extreme-scale HPC machines. In this talk, we will examine some of the drivers, and survey some of the work in HPC resilience R&D at Sandia National Labs.


Dr. Robert L. Clay is the manager of the Scalable Modeling and Analysis Systems Dept at Sandia National Laboratories in Livermore, CA. He is responsible for research and development in HPC systems resilience and programming models as part of the ASC and ASCR (exascale) programs. He also has responsibility for R&D in discrete system analysis (complex systems, formal methods), scalable data analysis, and engineering workflow and model building systems. In those roles he provides leadership in a broad range of activities with a core focus of scalable systems design and analysis.

Dr. Clay is a graduate of the Carnegie Mellon University (Ph.D.) and the University of Tennessee (B.S.), where he received degrees in Chemical Engineering. His graduate work focused on planning under uncertainty, where he worked on parallel stochastic programming methods and Bayesian inference schemes. Prior to working at Sandia National Labs, he worked at Exxon Research and Engineering in Florham Park, NJ, in the Systems Engineering Division. There he led projects in real time optimization, advanced computational control, and dynamic system modeling. Dr. Clay also served as VP and Chief Scientist for Terascale LLC where he was involved in the development of parallel FEM tools, codes, and services.

For More Information, Please Contact: