Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A comparison of human and computer marking of short free-text student responses
Butcher P., Jordan S. Computers & Education55 (2):489-499,2010.Type:Article
Date Reviewed: Oct 26 2010

Are computer-assisted assessment (CAA) systems as capable as human instructors of competently evaluating and grading sentence-length responses in written tests? If so, what depth of natural language processing is required? Based on a battery of tests, the authors evaluate the human marking capability compared with that of CAA systems, and subsequently investigate differences between the computational linguistic and the algorithmic quantitative approach to processing and matching short free-text responses of up to 20 words. While CAA systems such as the e-rater [1] are fairly reliable for evaluating free text stylistically in essays, the authors were primarily interested in how well computers were able to evaluate the factual responses in examination questions.

The methodology includes a comparison of markings of six course tutors with those of the linguistic-based Intelligent Assessment Technologies FreeText Author (IAT) [2], and subsequently with markings of the algorithmically based OpenMark system [3]. Results of the tests show that for the corpus of seven questions, the mean mark for the CAA system was within the range of means produced by the tutors, while differences among the tutors were large (page 492). The algorithmically based OpenMark was as accurate in its marking as the computationally linguistic-based IAT.

It is not surprising that the linguistic and the algorithmic methods produced adequate recognition results, given the constrained subset of natural language and the extensive study of both fields within cognitive science. However, the finding that those CAA systems had a distinct advantage over the human grader when it came to the consistency and accuracy in matching the answers to the questions is noteworthy, albeit plausible. Anyone who has graded hundreds of finals in a short time span can appreciate the high probability of making mistakes in the grading process of short sentence answers. Computers are simply better equipped than humans to handle large volumes of data consistently.

Reviewer:  Klaus K. Obermeier Review #: CR138521 (1105-0532)
1) Attali, Y.; Burstein, J. Automated essay scoring with e-rater V.2. The Journal of Technology, Learning, and Assessment 4, 3(2006), 1–31.http://www.jtla.org.
2) Jordan, S.; Mitchell, T. E-assessment for learning? The potential of short free-text questions with tailored feedback. British Journal of Educational Technology 40, 2(2009), 371– 385.
3) Butcher,P.G. OpenMark Examples http://www.open.ac.uk/openmarkexamples/ (09/26/2010).
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Linguistic Processing (H.3.1 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Linguistic Processing": Date
Anatomy of a text analysis package
Reed A. Information Systems 9(2): 89-96, 1984. Type: Article
Jun 1 1985
Dependency parsing for information retrieval
Metzler D., Noreault T., Richey L., Heidorn B.  Research and development in information retrieval (, King’s College, Cambridge,3241984. Type: Proceedings
Oct 1 1985
Automated medical office records
Gabrieli E. Journal of Medical Systems 11(1): 59-68, 1987. Type: Article
Nov 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy