Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/20275
dc.contributor | Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics |
---|---|
dc.contributor | Universitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural |
dc.contributor.author | Barrón-Cedeño, Alberto |
dc.contributor.author | Gupta, P. |
dc.contributor.author | Rosso, Paolo |
dc.date | 2013-09 |
dc.identifier.citation | Barron-Cedeño, A.; Gupta, P.; Rosso, P. Methods for cross-language plagiarism detection. "Knowledge-based systems", Setembre 2013, vol. 50, p. 211-217. |
dc.identifier.citation | 0950-7051 |
dc.identifier.citation | 10.1016/j.knosys.2013.06.018 |
dc.identifier.uri | http://hdl.handle.net/2117/20275 |
dc.description.abstract | Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks—something never done before. The experiments show that T+MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. |
dc.description.abstract | Peer Reviewed |
dc.language.iso | eng |
dc.rights | info:eu-repo/semantics/openAccess |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Sistemes experts |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació |
dc.subject | Plagiarism detection systems |
dc.subject | Automatic plagiarism detection Cross-language plagiarism Cross-language similarity Plagiarism detection architecture Text re-use analysis Cross languages Cross-language plagiarism detections Foreign countries Plagiarism detection Similarity analysis Under-resourced languages Architecture Linguistics Translation (languages) Intellectual property |
dc.subject | Plagi |
dc.title | Methods for cross-language plagiarism detection |
dc.type | info:eu-repo/semantics/publishedVersion |
dc.type | info:eu-repo/semantics/article |