Notes:
|
Person name disambiguation is basic to distinguish persons that share the same name where
unique identifiers are not defined. This problem is common in many domains, including digital
libraries or data bases with publications, where the same name can refer to multiple unique
authors. With the aim to attributing correctly the work, the data bases must be disambiguated.
This project wants to give a possible solution to this problem, designing and implementing an
algorithm for the disambiguation of the names. Different techniques and tools, within the scope
of the distributed computations, like Spark or Hadoop, will be used in the development, in order
to improve the efficiency of the process.
As a base data set, the more than 8 millions of publications from the AGRIS (International System
for Agricultural and Technology) repository will be used in the disambiguation process. |