ComputingReviews.com

A comparison of human and computer marking of short free-text student responses
Butcher P., Jordan S. Computers & Education55(2):489-499,2010.Type:Article

Date Reviewed: 10/26/10

Are computer-assisted assessment (CAA) systems as capable as human instructors of competently evaluating and grading sentence-length responses in written tests? If so, what depth of natural language processing is required? Based on a battery of tests, the authors evaluate the human marking capability compared with that of CAA systems, and subsequently investigate differences between the computational linguistic and the algorithmic quantitative approach to processing and matching short free-text responses of up to 20 words. While CAA systems such as the e-rater [1] are fairly reliable for evaluating free text stylistically in essays, the authors were primarily interested in how well computers were able to evaluate the factual responses in examination questions.

The methodology includes a comparison of markings of six course tutors with those of the linguistic-based Intelligent Assessment Technologies FreeText Author (IAT) [2], and subsequently with markings of the algorithmically based OpenMark system [3]. Results of the tests show that for the corpus of seven questions, the mean mark for the CAA system was within the range of means produced by the tutors, while differences among the tutors were large (page 492). The algorithmically based OpenMark was as accurate in its marking as the computationally linguistic-based IAT.

It is not surprising that the linguistic and the algorithmic methods produced adequate recognition results, given the constrained subset of natural language and the extensive study of both fields within cognitive science. However, the finding that those CAA systems had a distinct advantage over the human grader when it came to the consistency and accuracy in matching the answers to the questions is noteworthy, albeit plausible. Anyone who has graded hundreds of finals in a short time span can appreciate the high probability of making mistakes in the grading process of short sentence answers. Computers are simply better equipped than humans to handle large volumes of data consistently.

Attali, Y.; Burstein, J. Automated essay scoring with e-rater V.2. The Journal of Technology, Learning, and Assessment 4, 3(2006), 1–31.http://www.jtla.org.

Jordan, S.; Mitchell, T. E-assessment for learning? The potential of short free-text questions with tailored feedback. British Journal of Educational Technology 40, 2(2009), 371– 385.

Butcher,P.G. OpenMark Examples http://www.open.ac.uk/openmarkexamples/ (09/26/2010).

Reviewer: Klaus K. Obermeier

Review #: CR138521 (1105-0532)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy