Per accedir als documents amb el text complet, si us plau, seguiu el següent enllaç: http://hdl.handle.net/10459.1/60159
Títol:
|
Deduplication of Universitat de Lleida scholarly data
|
Autor/a:
|
Berga Gatius, Albert
|
Altres autors:
|
García González, Roberto; Universitat de Lleida. Escola Politècnica Superior |
Notes:
|
In this project we have used data science tools and techniques to detect duplicated data in GREC repository, which contains information about the articles published by University of Lleida staff. We have used Locality-sensitive hashing (LSH) to group articles in a way that those which are more likely to be duplicates are classified to the same group. Then, we have compared pairwise articles in the same group to determine which pairs are referring the same article. |
Matèries:
|
-Spark -Big data -Data mining -Data science -Macrodades -Mineria de dades |
Drets:
|
cc-by-nc-nd
http://creativecommons.org/licenses/by-nc-nd/4.0/
|
Tipus de document:
|
masterThesis |
Compartir:
|
|
Text complet d'aquest document
Mostra el registre complet del document