Debugging high-performance computing applications at massive scales

Ignacio Laguna, Dong H. Ahn, Bronis R. De Supinski, Todd Gamblin, Gregory L. Lee, Martin Schulz, Saurabh Bagchi, Milind Kulkarni, Bowen Zhou, Zhezhe Chen, Feng Qin

Research output: Contribution to journalArticlepeer-review

26 Citations (Scopus)

Abstract

Experts state that dynamic analysis techniques help programmers in finding the root cause of bugs in large-scale high-performance computing (HPC) parallel applications. These applications can run detailed numerical simulations that model the real world. The numerical correctness and software reliability of these applications is a major concern for scientists, due to the public importance of such scientific advances. A set of techniques that build on one another to accomplish large-scale debugging, leading to discovery of scaling bugs, or those that manifest themselves when the application is deployed at large scale, behavioral debugging by modeling control-flow behavior of tasks, and software-defect detection at the communication layer, can help programmers in achieving their objectives.

Original languageEnglish
Pages (from-to)72-81
Number of pages10
JournalCommunications of the ACM
Volume58
Issue number9
DOIs
Publication statusPublished - 01 Sept 2015

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Debugging high-performance computing applications at massive scales'. Together they form a unique fingerprint.

Cite this