Large event traces in parallel performance analysis.
Wolf, Felix; Freitag, Fèlix; Mohr, Bernd; Moore, Shirley; Wylie, Brian J. N.
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. DSG - Distributed Systems Group
A powerful and widely-used method for analyzing the performance behavior of parallel programs is event tracing. When an application is traced, performancerelevant events, such as entering functions or sending messages, are recorded at runtime and analyzed post-mortem to identify and potentially remove performance problems. While event tracing enables the detection of performance problems at a high level of detail, growing trace-file size often constrains its scalability on large-scale systems and complicates management, analysis, and visualization of trace data. In this article, we survey current approaches to handle large traces and classify them according to the primary issues they address and the primary benefits they offer.
