Título:
|
Iteration-fusing conjugate gradient
|
Autor/a:
|
Zhuang, Sicong; Casas, Marc
|
Otros autores:
|
Barcelona Supercomputing Center |
Abstract:
|
This paper presents the Iteration-Fusing Conjugate Gradient (IFCG) approach which is an evolution of the Conjugate Gradient method that consists in i) letting computations from different iterations to overlap between them and ii) splitting linear algebra kernels into subkernels to increase concurrency and relax data-dependencies. The paper presents two ways of applying the IFCG approach: The IFCG1 algorithm, which aims at hiding the cost of parallel reductions, and the IFCG2 algorithm, which aims at reducing idle time by starting computations as soon as possible. Both IFCG1 and IFCG2 algorithms are two complementary approaches aiming at increasing parallel performance. Extensive numerical experiments are conducted to compare the IFCG1 and IFCG2 numerical stability and performance against four state-of-the-art techniques. By considering a set of representative input matrices, the paper demonstrates that IFCG1 and IFCG2 provide parallel performance improvements up to 42.9% and 41.5% respectively and average improvements of 11.8% and 7.1% with respect to the best state-of-the-art techniques while keeping similar numerical stability properties. Also, this paper provides an evaluation of the IFCG algorithms' sensitivity to system noise and it demonstrates that they run 18.0% faster on average than the best state-of-the-art technique under realistic degrees of system noise. |
Abstract:
|
This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316) , by Generalitat de Catalunya
(contracts 2014-SGR-1051 and 2014-SGR-1272) and by the IBM/BSC Deep Learning Center Initiative. |
Abstract:
|
Peer Reviewed |
Materia(s):
|
-Àrees temàtiques de la UPC::Enginyeria elèctrica -Parallel programming (Computer science) -Parallel computers -High performance computing -Computing methodologies -Parallel algorithms -Sparse linear algebra -Overlap between iterations -Mitigation of synchronization costs -Task parallelism -Processament en paral·lel (Ordinadors) -Supercomputadors |
Derechos:
|
Attribution-NonCommercial-NoDerivs 3.0 Spain
http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
Tipo de documento:
|
Artículo - Versión presentada Objeto de conferencia |
Editor:
|
Association for Computing Machinery (ACM)
|
Compartir:
|
|