Plagiarism meets paraphrasing: insights for the new generation in automatic plagiarism detection

dc.contributor.author
Barrón-Cedeño, Alberto
dc.contributor.author
Vila Rigat, Marta
dc.contributor.author
Martí Antonin, M. Antònia
dc.contributor.author
Rosso, Paolo
dc.date.issued
2014-02-04T08:41:54Z
dc.date.issued
2014-03-01T23:02:07Z
dc.date.issued
2013-12-01
dc.date.issued
2014-02-03T16:45:38Z
dc.identifier
0891-2017
dc.identifier
https://hdl.handle.net/2445/49363
dc.identifier
619558
dc.description.abstract
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.
dc.format
23 p.
dc.format
application/pdf
dc.format
application/pdf
dc.language
eng
dc.publisher
The MIT Press
dc.relation
Reproducció del document publicat a: http://dx.doi.org/10.1162/COLI_a_00153
dc.relation
Computational Linguistics, 2013, vol. 39, num. 4, p. 917-947
dc.relation
http://dx.doi.org/10.1162/COLI_a_00153
dc.relation
info:eu-repo/grantAgreement/EC/FP7/269180/EU//WIQ-EI
dc.relation
info:eu-repo/grantAgreement/EC/FP7/246016/EU//ABCDE
dc.rights
(c) Association for Computational Linguistics, 2013
dc.rights
info:eu-repo/semantics/openAccess
dc.source
Articles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject
Plagi
dc.subject
Paràfrasi
dc.subject
Lingüística computacional
dc.subject
Tractament del llenguatge natural (Informàtica)
dc.subject
Plagiarism
dc.subject
Paraphrase
dc.subject
Computational linguistics
dc.subject
Natural language processing (Computer science)
dc.title
Plagiarism meets paraphrasing: insights for the new generation in automatic plagiarism detection
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)