The Lightweight Distributed Metric Service (LDMS) is a scalable lowoverhead High Performance Computer (HPC) monitoring framework for transport of system resource utilization data as well as application/workflow progress and performance information. LDMS also includes plugins for a variety of storage methods, including publication to a Kafka distributed event bus, as well as pre-storage analysis. Additionally, since it supports bi-directional data flow, LDMS can be utilized as a low-latency substrate for communicating conditions of interest from an analysis system back to system and/or application software to enable run time modification of behavior. This seminar will present the salient features of the LDMS ecosystem, how it is currently being deployed at other supercomputing sites, and current production and research activities in analysis, visualization, and active feedback. Furthermore, this seminar will introduce the WorkVisualizer framework, an open-source profiling tool developed by NexGen Analytics (NGA) that offers high-level, interactive, and visual HPC performance analysis. LDMS seeks to integrate the WorkVisualizer into its ecosystem in order to assist with converting vast amounts of monitoring data into actionable intelligence.
Conference report
English
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors; High performance computing; Càlcul intensiu (Informàtica)
Barcelona Supercomputing Center
http://creativecommons.org/licenses/by-nc-nd/4.0/
Open Access
Congressos [11156]