Per accedir als documents amb el text complet, si us plau, seguiu el següent enllaç: http://hdl.handle.net/10459.1/60159

Deduplication of Universitat de Lleida scholarly data
Berga Gatius, Albert
García González, Roberto; Universitat de Lleida. Escola Politècnica Superior
In this project we have used data science tools and techniques to detect duplicated data in GREC repository, which contains information about the articles published by University of Lleida staff. We have used Locality-sensitive hashing (LSH) to group articles in a way that those which are more likely to be duplicates are classified to the same group. Then, we have compared pairwise articles in the same group to determine which pairs are referring the same article.
-Spark
-Big data
-Data mining
-Data science
-Macrodades
-Mineria de dades
cc-by-nc-nd
http://creativecommons.org/licenses/by-nc-nd/4.0/
masterThesis
         

Text complet d'aquest document

Fitxers Mida Format Visualitza
abergag.pdf 1.914 MB application/pdf Visualitza/Obre

Mostra el registre complet del document