Computing Reviews, the leading online review service for computing literature.

Search

A comparison of human and computer marking of short free-text student responses
Butcher P., Jordan S. Computers & Education55 (2):489-499,2010.Type:Article

Date Reviewed: Oct 26 2010

Are computer-assisted assessment (CAA) systems as capable as human instructors of competently evaluating and grading sentence-length responses in written tests? If so, what depth of natural language processing is required? Based on a battery of tests, the authors evaluate the human marking capability compared with that of CAA systems, and subsequently investigate differences between the computational linguistic and the algorithmic quantitative approach to processing and matching short free-text responses of up to 20 words. While CAA systems such as the e-rater [1] are fairly reliable for evaluating free text stylistically in essays, the authors were primarily interested in how well computers were able to evaluate the factual responses in examination questions. The methodology includes a comparison of markings of six course tutors with those of the linguistic-based Intelligent Assessment Technologies FreeText Author (IAT) [2], and subsequently with markings of the algorithmically based OpenMark system [3]. Results of the tests show that for the corpus of seven questions, the mean mark for the CAA system was within the range of means produced by the tutors, while differences among the tutors were large (page 492). The algorithmically based OpenMark was as accurate in its marking as the computationally linguistic-based IAT. It is not surprising that the linguistic and the algorithmic methods produced adequate recognition results, given the constrained subset of natural language and the extensive study of both fields within cognitive science. However, the finding that those CAA systems had a distinct advantage over the human grader when it came to the consistency and accuracy in matching the answers to the questions is noteworthy, albeit plausible. Anyone who has graded hundreds of finals in a short time span can appreciate the high probability of making mistakes in the grading process of short sentence answers. Computers are simply better equipped than humans to handle large volumes of data consistently.

Reviewer: Klaus K. Obermeier	Review #: CR138521 (1105-0532)

1)	Attali, Y.; Burstein, J. Automated essay scoring with e-rater V.2. The Journal of Technology, Learning, and Assessment 4, 3(2006), 1–31.http://www.jtla.org.

2)	Jordan, S.; Mitchell, T. E-assessment for learning? The potential of short free-text questions with tailored feedback. British Journal of Educational Technology 40, 2(2009), 371– 385.

3)	Butcher,P.G. OpenMark Examples http://www.open.ac.uk/openmarkexamples/ (09/26/2010).

Linguistic Processing (H.3.1 ... )

Content Analysis And Indexing (H.3.1 )

Would you recommend this review?

yes

Other reviews under "Linguistic Processing":	Date

Anatomy of a text analysis package Reed A. Information Systems 9(2): 89-96, 1984. Type: Article	Jun 1 1985

Dependency parsing for information retrieval Metzler D., Noreault T., Richey L., Heidorn B. Research and development in information retrieval (, King’s College, Cambridge,3241984. Type: Proceedings	Oct 1 1985

Automated medical office records Gabrieli E. Journal of Medical Systems 11(1): 59-68, 1987. Type: Article	Nov 1 1988

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy