Abstract:
|
A powerful and widely-used method for analyzing the performance behavior of
parallel programs is event tracing. When an application is traced, performancerelevant
events, such as entering functions or sending messages, are recorded at runtime
and analyzed post-mortem to identify and potentially remove performance problems.
While event tracing enables the detection of performance problems at a high
level of detail, growing trace-file size often constrains its scalability on large-scale
systems and complicates management, analysis, and visualization of trace data. In this
article, we survey current approaches to handle large traces and classify them according
to the primary issues they address and the primary benefits they offer. |