To access the full text documents, please follow this link:

Evaluating geographical knowledge re-ranking, linguistic processing and query expansion techniques for geographical information retrieval
Ferrés Domènech, Daniel; Rodríguez Hontoria, Horacio
Universitat Politècnica de Catalunya. Departament de Ciències de la Computació; Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
This paper describes and evaluates the use of Geographical Knowledge Re-Ranking, Linguistic Processing, and Query Expansion techniques to improve Geographical Information Retrieval effectiveness. Geographical Knowledge Re-Ranking is performed with Geographical Gazetteers and conservative Toponym Disambiguation techniques that boost the ranking of the geographically relevant documents retrieved by standard state-of-the-art Information Retrieval algorithms. Linguistic Processing is performed in two ways: 1) Part-of-Speech tagging and Named Entity Recognition and Classification are applied to analyze the text collections and topics to detect toponyms, 2) Stemming (Porter’s algorithm) and Lemmatization are also applied in combination with default stopwords filtering. The Query Expansion methods tested are the Bose-Einstein (Bo1) and Kullback-Leibler term weighting models. The experiments have been performed with the English Monolingual test collections of the GeoCLEF evaluations (from years 2005, 2006, 2007, and 2008) using the TF-IDF, BM25, and InL2 Information Retrieval algorithms over unprocessed texts as baselines. The experiments have been performed with each GeoCLEF test collection (25 topics per evaluation) separately and with the fusion of all these collections (100 topics). The results of evaluating separately Geographical Knowledge Re-Ranking, Linguistic Processing (lemmatization, stemming, and the combination of both), and Query Expansion with the fusion of all the topics show that all these processes improve the Mean Average Precision (MAP) and RPrecision effectiveness measures in all the experiments and show statistical significance over the baselines in most of them. The best results in MAP and RPrecision are obtained with the InL2 algorithm using the following techniques: Geographical Knowledge Re-Ranking, Lemmatization with Stemming, and Kullback-Leibler Query Expansion. Some configurations with Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion have improved the MAP of the best official results at GeoCLEF evaluations of 2005, 2006, and 2007.
Peer Reviewed
Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
Information retrieval
Geographical gazetteers
Natural Language Processing
Toponym Disambiguation
Query Expansion
Efectiveness Measures
Diccionaris geogràfics

Show full item record

Related documents

Other documents of the same author

Fuentes Fort, Maria; Rodríguez Hontoria, Horacio; Ferrés Domènech, Daniel