World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.
Special Issue on The Mexican Conference on Pattern Recognition; Guest Editors: José Fco. Martínez-Trinidad (National Institute of Astrophysics, Optics and Electronics, Mexico), Jesús Ariel Carrasco-Ochoa (National Institute of Astrophysics, Optics and Electronics, Mexico), Víctor Ayala-Ramírez (University of Guanajuato, Mexico) and José Arturo Olvera-López (Autonomous University of Puebla (BUAP), Mexico)No Access

Plagiarism Detection with Genetic-Based Parameter Tuning

    https://doi.org/10.1142/S0218001418600066Cited by:2 (Source: Crossref)

    A crucial step in plagiarism detection is text alignment. This task consists in finding similar text fragments between two given documents. We introduce an optimization methodology based on genetic algorithms to improve the performance of a plagiarism detection model by optimizing its input parameters. The implementation of the genetic algorithm is based on nonbinary representation of individuals, elitism selection, uniform crossover, and high mutation rate. The obtained parameter settings allow the plagiarism detection model to achieve better results than the state-of-the-art approaches.

    References

    • 1. S. Abnar, M. Dehghani, H. Zamani and A. Shakery, Expanded n-grams for semantic text alignment, in CLEF Workshop Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, Vol. 1180 (CEUR-WS.org, 2014), pp. 928–938. Google Scholar
    • 2. F. Alvi, M. Stevenson and P. D. Clough, Hashing and merging heuristics for text reuse detection, in CLEF Workshop Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, Vol. 1180 (CEUR-WS.org, 2014), pp. 939–946. Google Scholar
    • 3. D. Bär, T. Zesch and I. Gurevych, Text reuse detection using a composition of text similarity measures, eds. M. Kay and C. Boitet in Proc 24th Int. Conf. Computat. Linguistics (COLING 2012), 8–15 December 2012, Mumbai, India Indian Institute of Technology Bombay, 2012, pp. 167–184. Google Scholar
    • 4. H. A. Bouarara, A. Rahmani, R. M. Hamou and A. Amine, Machine learning tool and meta-heuristic based on genetic algorithms for plagiarism detection over mail service, in 2014 IEEE/ACIS 13th Int. Conf. Computer and Information Science (ICIS), (2014), pp. 157–162. Google Scholar
    • 5. L. Gillam, Guess again and see if they line up: Surrey’s runs at plagiarism detection notebook for PAN at (CLEF 2013) in CEUR Workshop Proc. Working Notes for CLEF 2013 Conf., 23–26 September 2013, Valencia, Spin. Google Scholar
    • 6. L. Gillam and S. Notley, Evaluating robustness for “IPCRESS”: Surrey’s text alignment for plagiarism detection, in CLEF Workshop Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, Vol. 1180 (CEUR-WS.org, 2014), pp. 951–957. Google Scholar
    • 7. D. S. Glinos, A hybrid architecture for plagiarism detection, in CEUR Workshop Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, Vol. 1180 (CEUR-WS.org, 2014), pp. 958–965. Google Scholar
    • 8. T. Gollub, M. Potthast, A. Beyer, M. Busse, F. M. R. Pardo, P. Rosso, E. Stamatatos and B. Stein, Recent trends in digital text forensics and its evaluation — plagiarism detection, author identification, and author profiling, P. Forner, H. Müller, R. Paredes, P. Rosso and B. Stein (eds.), in Proc. 4th Int. Conf. CLEF Initiative, Information Access Evaluation, Multiquality, Multi-Modality, and Visualization (CLEF 2013), 23–26 September 2013, Valencia, Spain. Lecture Notes in Computer Science, Vol. 8138 (Springer, 2013), pp. 282–302. Google Scholar
    • 9. H. Gómez-Adorno, I. Markov, G. Sidorov, J.-P. Posadas-Durán, M. A. Sanchez-Perez and L. Chanona-Hernandez, Improving feature representation based on a neural network for author profiling in social media texts, Comput. Intell. Neurosci. 2016 (2016) 2. Crossref, Web of ScienceGoogle Scholar
    • 10. H. Gómez-Adorno, G. Sidorov, D. Pinto, D. Vilariño and A. Gelbukh, Automatic authorship detection using textual patterns extracted from integrated syntactic graphs, Sensors 16 (9) (2016) 1374. Crossref, Web of ScienceGoogle Scholar
    • 11. P. Gross and P. Modaresi, Plagiarism alignment detection by merging context seeds, in CEUR Workshop Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, Vol. 1180 (CEUR-WS.org, 2014), pp. 966–972. Google Scholar
    • 12. L. Kong, H. Qi, C. Du, M. Wang and Z. Han, Approaches for source retrieval and text alignment of plagiarism detection notebook for PAN at CLEF 2013, in CEUR Workshop, Proc. Working Notes for CLEF 2013 Conf., 23–26 September 2013, Valencia, Spain, Vol. 1179 (CEUR-WS.org, 2013). Google Scholar
    • 13. R. Küppers and S. Conrad, A set-based approach to plagiarism detection, in Proc. CEUR Workshop Evaluation Labs and Workshop, Online Working Notes, eds. P. Forner, J. Karlgren and C. Womser-Hacker (CLEF 2012), 17–20 September 2012, Rome, Italy, Vol. 1178 (CEUR-WS.org, 2012). Google Scholar
    • 14. R. C. Lange and S. Mancoridis, Using code metric histograms and genetic algorithms to perform author identification for software forensics, in Proc. 9th Annual Conf. Genetic and Evolutionary Computation, 07–11 July 2007 (London, UK, 2007), pp. 2082–2089. Google Scholar
    • 15. I. Markov, H. Gómez-Adorno, J.-P. Posadas-Durán, G. Sidorov and A. Gelbukh, Author profiling with doc2vec neural network-based document embeddings, in Proc. 15th Mexican Int. Conf. Artificial Intelligence (MICAI 2016), 23–28 October 2016, Cancún, Mexico (Springer, in press). Google Scholar
    • 16. I. Markov, H. Gómez-Adorno, G. Sidorov and A. Gelbukh, Adapting cross-genre author profiling to language and corpus, in CEUR Workshop Proc. Working Notes of CLEF 2016 Conf., 5-8 September 2016, Évora, Portugal, Vol. 1609 (CEUR-WS.org, 2016), pp. 947–955. Google Scholar
    • 17. F. Mashhadirajab and M. Shamsfard, A text alignment algorithm based on prediction of obfuscation types using SVM neural network, in CEUR Workshop Proc. Working Notes of Forum for Information Retrieval Evaluation (FIRE 2016), eds. P. Majumder, M. Mitra, P. Mehta, J. Sankhavara and K. Ghosh, 7–10 December 2016, Kolkata, India, Vol. 1737 (CEUR-WS.org, 2016), pp. 167–171. Google Scholar
    • 18. H. Maurer, F. Kappe and B. Zaka, Plagiarism – A survey, J. Univers. Comput. Sci. 12 ( August 2006) 1050–1084. Web of ScienceGoogle Scholar
    • 19. M. Mitchell, An Introduction to Genetic Algorithms (MIT Press, 1998). CrossrefGoogle Scholar
    • 20. G. Oberreuter and A. Eiselt, Submission to the 6th Int. competition on plagiarism detection, http://www.webis.de/research/events/pan-14 (2014), http://www.clef-initiative.eu/publication/working-notes, from Innovand.io, Chile. Google Scholar
    • 21. Y. Palkovskii and A. Belov, Using hybrid similarity methods for plagiarism detection notebook for PAN at (CLEF 2013), in CEUR Workshop, Proc. Working Notes for CLEF 2013 Conf., 23–26 September 2013, Valencia, Spain, Vol. 1179 (CEUR-WS.org, 2013). Google Scholar
    • 22. J. Posadas-Durán, H. Gómez-Adorno, I. Markov, G. Sidorov, I. Batyrshin, A. Gelbukh and O. Pichardo-Lagunas, Syntactic n-grams as features for the author profiling task, in CEUR Workshop Proc. Working Notes of CLEF 2015 Conf., 8–11 September 2015, Toulouse, France, Vol. 1391 (CEUR-WS.org, 2015). Google Scholar
    • 23. M. Potthast, A. Barrón-Cedeño, A. Eiselt, B. Stein and P. Rosso, Overview of the 2nd international competition on plagiarism detection, in Proc. of the SEPLN’10 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (2010), Valencia, Spain. Google Scholar
    • 24. M. Potthast, M. Hagen, A. Beyer, M. Busse, M. Tippmann, P. Rosso and B. Stein, Overview of the 6th International Competition on Plagiarism Detection, in CEUR Workshop, Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, pp. 845–876. Google Scholar
    • 25. M. Potthast, M. Hagen, T. Gollub, M. Tippmann, J. Kiesel, P. Rosso, E. Stamatatos and B. Stein, Overview of the 5th International Competition on Plagiarism Detection, in CEUR Workshop, Proc. Working Notes for CLEF 2013 Conf., 23–26 September 2013, Valencia, Spain. Google Scholar
    • 26. M. Potthast, B. Stein, A. Eiselt, A. Barrón-Cede no and P. Rosso, Overview of the 1st International Competition on Plagiarism Detection, in Proc. of the SEPLN’09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09), 10 September 2009, San Sebastian Donostia, Spain, Vol. 502 (CEUR-WS.org, 2009), pp. 1–9. Google Scholar
    • 27. D. A. Rodríguez Torrejón and J. M. Martín Ramos, Text alignment module in CoReMo 2.1 plagiarism detector notebook for PAN at CLEF 2013, in CEUR Workshop, Proc. Working Notes for CLEF 2013 Conf. 23–26 September 2013, Valencia, Spain. Google Scholar
    • 28. D. A. Rodríguez Torrejón and J. M. Martín Ramos, Coremo 2.3 plagiarism detector text alignment module, in CEUR Workshop Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, Vol. 1180 (CEUR-WS.org, 2014), pp. 997–1003. Google Scholar
    • 29. M. A. Sanchez-Perez, A. Gelbukh and G. Sidorov, Adaptive algorithm for plagiarism detection: The best-performing approach at PAN 2014 text alignment competition, in Experimental IR Meets Multilinguality, Multimodality, and Interaction — 6th Int. Conf. CLEF Association (CLEF 2015), eds. J. Mothe, J. Savoy, J. Kamps, K. Pinel-Sauvagnat, G. J. F. Jones, E. SanJuan, L. Cappellato and N. Ferro, Toulouse, France, 8–11 September, 2015, Lecture Notes in Computer Science, Vol. 9283 (Springer, 2015), pp. 402–413. Google Scholar
    • 30. M. A. Sánchez-Pérez, A. Gelbukh and G. Sidorov, Dynamically adjustable approach through obfuscation type recognition, in eds. L. Cappellato, N. Ferro, G. J. F. Jones and E. SanJuan, 2015 CEUR Workshop, Proc. Working Notes of Conference and Labs of the Evaluation Forum (CLEF 2015), 8–11 September 2015, Toulouse, France, Vol. 1391 (CEUR-WS.org, 2015). Google Scholar
    • 31. M. A. Sanchez-Perez, G. Sidorov and A. Gelbukh, The winning approach to text alignment for text reuse detection at PAN 2014, in CEUR Workshop Proc. Working Notes for CLEF 2014 Conf., 15–18 September 2014, Sheffield, UK, Vol. 1180 (CEUR-WS.org, 2014), pp. 1004–1011. Google Scholar
    • 32. P. Shrestha and T. Solorio, Using a variety of n-grams for the detection of different kinds of plagiarism notebook for PAN at CLEF 2013, in CEUR Workshop Proc. Working Notes for CLEF 2013 Conf., 23–26 September 2013, Valencia, Spain, Vol. 1179 (CEUR-WS.org, 2013). Google Scholar
    • 33. G. Sidorov, A. Gelbukh, H. Gómez-Adorno and D. Pinto, Soft similarity and soft cosine measure: Similarity of features in vector space model, Computación y Sistemas 18 (3) (2014) 491–504. Google Scholar
    • 34. G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh and L. Chanona-Hernández, Syntactic dependency-based n-grams: More evidence of usefulness in classification, in ed. A. Gelbukh Proc. 14th Int. Conf. Proc. Computational Linguistics and Intelligent Text Processing — (CICLing 2013), Samos, Greece, 24–30 March 2013, Lecture Notes in Computer Science, Vol. 7816 (Springer, 2013), pp. 13–24. Google Scholar
    • 35. S. Suchomel, J. Kasprzak and M. Brandejs, Diverse queries and feature type selection for plagiarism discovery notebook for PAN at CLEF 2013, in CEUR Workshop Proc. Working Notes for CLEF 2013 Conf., 23–26 September 2013, Valencia, Spain, Vol. 1179 (CEUR-WS.org, 2013). Google Scholar