As the student-teacher ratio increases and institutions move toward larger class sizes, educators are faced with increased assessment workloads. Online education is one area where extremely large numbers of students enroll in a single course, which can necessitate a larger assessment workload for the lecturer. In assessing student work, the issue of plagiarism is very significant for educators as they grade submissions, which can be very similar. Assessment tasks that require students to write computer code for simple algorithms will always result in similar solutions. This paper proposes a method that will improve plagiarism detection for computer science courses by dynamically removing parts of the assessment task that are common to all submissions.
In the paper, the authors outline their method and support their system with an analysis of data from the training corpus provided by the SOCO 2014 challenge. Although the mathematics of the results can be daunting for the layperson, the conclusions are easily understood. The authors have shown their method is significantly better than comparable options currently available to educators. This is an interesting paper and should be read by all computer science educators who are concerned with detecting plagiarism in their courses.