dc.contributor |
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.contributor |
Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.contributor.author |
Hayes, Timothy |
dc.contributor.author |
Palomar Pérez, Óscar |
dc.contributor.author |
Unsal, Osman Sabri |
dc.contributor.author |
Cristal Kestelman, Adrián |
dc.contributor.author |
Valero Cortés, Mateo |
dc.date |
2015 |
dc.identifier.citation |
Hayes, T., Palomar, O., Unsal, O., Cristal, A., Valero, M. VSR sort: a novel vectorised sorting algorithm and architecture extensions for future microprocessors. A: International Symposium on High-Performance Computer Architecture. "2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA 2015): Burlingame, California, USA: 7-11 February 2015". San Francisco Bay Area, California: Institute of Electrical and Electronics Engineers (IEEE), 2015, p. 26-38. |
dc.identifier.citation |
978-1-4799-8931-7 |
dc.identifier.citation |
10.1109/HPCA.2015.7056019 |
dc.identifier.uri |
http://hdl.handle.net/2117/77204 |
dc.language.iso |
eng |
dc.publisher |
Institute of Electrical and Electronics Engineers (IEEE) |
dc.relation |
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7056019 |
dc.rights |
info:eu-repo/semantics/openAccess |
dc.subject |
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació |
dc.subject |
Àrees temàtiques de la UPC::Informàtica |
dc.subject |
Microprocessors |
dc.subject |
Supercomputers |
dc.subject |
Digital arithmetic |
dc.subject |
Microprocessor chips |
dc.subject |
Parallel architectures |
dc.subject |
Sorting |
dc.subject |
Microprocessadors |
dc.subject |
Supercomputadors |
dc.title |
VSR sort: a novel vectorised sorting algorithm and architecture extensions for future microprocessors |
dc.type |
info:eu-repo/semantics/publishedVersion |
dc.type |
info:eu-repo/semantics/conferenceObject |
dc.description.abstract |
Sorting is a widely studied problem in computer science and an elementary building block in many of its subfields. There are several known techniques to vectorise and accelerate a handful of sorting algorithms by using single instruction-multiple data (SIMD) instructions. It is expected that the widths and capabilities of SIMD support will improve dramatically in future microprocessor generations and it is not yet clear whether or not these sorting algorithms will be suitable or optimal when executed on them. This work extrapolates the level of SIMD support in future microprocessors and evaluates these algorithms using a simulation framework. The scalability, strengths and weaknesses of each algorithm are experimentally derived. We then propose VSR sort, our own novel vectorised non-comparative sorting algorithm based on radix sort. To facilitate the execution of this algorithm we define two new SIMD instructions and propose a complementary hardware structure for their execution. Our results show that VSR sort has maximum speedups between 14.9x and 20.6x over a scalar baseline and an average speedup of 3.4x over the next-best vectorised sorting algorithm. |
dc.description.abstract |
Sorting is a widely studied problem in computer science and an elementary building block in many of its subfields. There are several known techniques to vectorise and accelerate a handful of sorting algorithms by using single instruction-multiple data (SIMD) instructions. It is expected that the widths and capabilities of SIMD support will improve dramatically in future microprocessor generations and it is not yet clear whether or not these sorting algorithms will be suitable or optimal when executed on them. This work extrapolates the level of SIMD support in future microprocessors and evaluates these algorithms using a simulation framework. The scalability, strengths and weaknesses of each algorithm are experimentally derived. We then propose VSR sort, our own novel vectorised non-comparative sorting algorithm based on radix sort. To facilitate the execution of this algorithm we define two new SIMD instructions and propose a complementary hardware structure for their execution. Our results show that VSR sort has maximum speedups between 14.9x and 20.6x over a scalar baseline and an average speedup of 3.4x over the next-best vectorised sorting algorithm. |
dc.description.abstract |
The authors would like to thank Morteza Biglari-Abhari for reading this work. We are largely indebted to the efforts of Michael Swift who meticulously reviewed and critiqued the text multiple times leading to a considerably improved publication. We would also like to thank all of our anonymous
reviewers, especially reviewer B for his or her highly detailed and insightful comments which helped develop the article. The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under the AXLE project (GA no. 318633) and from the RoMoL ERC Advanced Grant (GA no. 321253). Timothy Hayes is also supported by a FPU research grant from the Spanish MECD. |
dc.description.abstract |
Peer Reviewed |