Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Towards automatic identification of core concepts in educational resources
Sultan M., Bethard S., Sumner T.  JCDL 2014 (Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, London, UK, Sep 8-12, 2014)379-388.2014.Type:Proceedings
Date Reviewed: May 27 2015

This paper studies the problem of recognizing the degree of similarity between ideas expressed in sentences. In particular, the authors consider the case of science education, where a set of core ideas expressed in sentences exists and one wants to know how close the idea in another sentence is to that core.

The paper proposes a two-phase algorithm to solve this problem. In the first phase, relevant features are extracted from both the core sentence and the sentence being examined. In the second phase, a machine learning classifier, trained by human annotations, produces a label, giving the degree of “coreness” of the sentence being examined. Much of this paper deals with computing the classification features. One computation is based on string similarity, that is, the number of identical words or character sequences. Character sequences of lengths two through five are considered. A second computation is based on semantic similarity, that is, the relationship between meanings in two sentences. Two external resources were used for identifying meanings: ConceptNet [1] and Wikipedia Miner [2].

To compute semantic similarity, the authors weight words by their information content; low-frequency words have greater information content. A third computation measures the probability that the sentence in question would be generated by the set of words in the core set, using a computed probability distribution of words in the core set. Finally, a set of shallow sentence features is computed. The shallow feature used in the classifier is sentence length. In summary, the authors chose to use string similarity, semantic similarity, a generative model, and a shallow feature as input to the classifier.

The metrics used to evaluate performance characteristics of the classifier were: accuracy, precision, recall, and the harmonic mean of precision and recall. The scores, using the chosen metrics, were compared with scores from other identification systems, including that presented by Foster et al. [3] and the COGENT system [4]. The full-featured model described here behaved better: 13% better in accuracy over Foster’s, which is only semi-automated, and 44% better than COGENT. Performance measured by other metrics was similar. The authors claim their approach is computationally reasonable but do not give pertinent details. The results presented in the paper are promising. The intended audience of this work is developers of algorithms for extracting core concepts in written documents.

Reviewer:  B. Hazeltine Review #: CR143474 (1508-0729)
1) Liu, H; Singh, P Conceptnet – a practical commonsense reasoning tool-kit. BT Technology J. 22, 4(2004), 211–226.
2) Milne, D; Witten, I An open-source toolkit for mining Wikipedia . Artificial Intelligence 194, (2013), 222–239.
3) Foster, J. M; Sultan, M. A; Devaul, H.; Okoye, I.; Sumner, T. Identifying core concepts in educational resources. In Proc. 12th ACM/IEEE-CS Joint Conf. Digital Libraries (New York, ), ACM/IEEE-CS, 2012, 35–42.
4) de la Chica, S.; Ahmad, F; Martin, J. H; Sumner, T. Pedagogically useful extractive summaries for science education. In Proc. 22nd Int. Conf. Computational Linguistics – Volume 1 (Manchester, England), ), Association for Computational Linguistics, 2008, 177–184.
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Text Analysis (I.2.7 ... )
 
 
Computer-Assisted Instruction (CAI) (K.3.1 ... )
 
 
Computer Uses in Education (K.3.1 )
 
 
Digital Libraries (H.3.7 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Text Analysis": Date
Some issues in the semantics and pragmatics of definite reference in the context of natural language database access
Berry-Rogghe G. Circuits, Systems, and Signal Processing 3(1): 47-54, 1984. Type: Article
Jun 1 1985
Word division in Spanish
Mañas J. Communications of the ACM 30(7): 612-616, 1987. Type: Article
Jul 1 1989
Schemata for understanding of argumentation in newspaper texts
Roesner D.  Progress in artificial intelligence (, Orsay, France,3111985. Type: Proceedings
Apr 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy