Abstract:
|
A typical application of the ASM (Approximate String Matching) is the matching of personal names, as for example to search people in the DB of an Information System. Through the years, several similarity functions have been proposed:phonetic codes, simple edit distance, n-gram distances, etc. |
Abstract:
|
A typical application of the ASM (Approximate String Matching) is the
matching of personal names, as for example to search people in the DB of
an Information System. Through the years, several similarity functions
have been proposed: phonetic codes, simple edit distance, n-gram
distances, etc. In this report a function is presented, DEA, having
substantially better efficacy than existing ones, and mainly oriented to
spanish surnames. The DEA distance is an edit distance, with costs based
on the probabilities of the operations, characters and positions. The
distance threshold is defined as a function of the lenght of the string.
The efficacy of DEA is evaluated objectively, without human relevance
judgements. |