Abstract:
|
There are two core objectives in this work: firstly, to build a data set, and secondly, to
customize a search engine.
The first objective is to design and implement a data set builder. There are two steps
required for this. The first step is to build a crawler. The second step is to include a cleaner.
The crawler collects Web links. The cleaner extracts the main content and removes noise
from the files crawled. The goal of this application is crawling Web news sites to find the
different sources of the news and retrieve the original articles.
The second core objective is to customize a search engine. There are two steps required for
this. The first step is to enhance the functionalities of the search engine. The second step is to integrate the enhanced search engine with a different knowledge management platform. In order to enhance the search engine, meta-information is added to its index, and the retrieval
process is modified so that the selection of documents to be retrieved takes into account this new meta-information. The integration of the search engine to a different knowledge platform is a requirement of this project, so that pre-existing repositories of knowledge can interact with the search engine in an efficient and effective way. This integration also includes the
development of a front-end that will allow users to utilize the search engine. The goal ofthis application is to provide the users with a convenient environment that allows retrieving information depending on meta-information. |