SORS: Improving HPC performance and throughput through monitoring, analysis, and feedback

dc.contributor.author
Brandt, Jim
dc.date.accessioned
2026-02-11T01:37:20Z
dc.date.available
2026-02-11T01:37:20Z
dc.date.issued
2025-06-16
dc.identifier
Brandt, J. SORS: Improving HPC performance and throughput through monitoring, analysis, and feedback. A: Severo Ochoa Research Seminars at BSC. «10th Severo Ochoa Research Seminar Lectures at BSC, Barcelona, 2024-25». Barcelona: Barcelona Supercomputing Center, 2025, p. 148-149.
dc.identifier
https://hdl.handle.net/2117/454301
dc.identifier.uri
http://hdl.handle.net/2117/454301
dc.description.abstract
The Lightweight Distributed Metric Service (LDMS) is a scalable lowoverhead High Performance Computer (HPC) monitoring framework for transport of system resource utilization data as well as application/workflow progress and performance information. LDMS also includes plugins for a variety of storage methods, including publication to a Kafka distributed event bus, as well as pre-storage analysis. Additionally, since it supports bi-directional data flow, LDMS can be utilized as a low-latency substrate for communicating conditions of interest from an analysis system back to system and/or application software to enable run time modification of behavior. This seminar will present the salient features of the LDMS ecosystem, how it is currently being deployed at other supercomputing sites, and current production and research activities in analysis, visualization, and active feedback. Furthermore, this seminar will introduce the WorkVisualizer framework, an open-source profiling tool developed by NexGen Analytics (NGA) that offers high-level, interactive, and visual HPC performance analysis. LDMS seeks to integrate the WorkVisualizer into its ecosystem in order to assist with converting vast amounts of monitoring data into actionable intelligence.
dc.format
2 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Barcelona Supercomputing Center
dc.rights
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights
Open Access
dc.subject
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject
High performance computing
dc.subject
Càlcul intensiu (Informàtica)
dc.title
SORS: Improving HPC performance and throughput through monitoring, analysis, and feedback
dc.type
Conference report


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Congressos [11156]