Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Decision tree induction with a constrained number of leaf nodes
Wu C., Chen Y., Liu Y., Yang X. Applied Intelligence45 (3):673-685,2016.Type:Article
Date Reviewed: Dec 6 2016

Decision trees are broadly applied in machine learning, especially for data-driven decision making. Training with a subset of data, one may build a decision tree to classify other data items. The concept of the decision tree is not hard at all to understand. However, there are practical challenges when applying decision trees on large-size high-dimensional data because straightforward approaches result in very complicated or impractical trees. Hence, there are various approaches to simplify a decision tree and to improve its accuracy. Most of these techniques include a pruning phase after the building phase.

The authors of this paper describe an approach they call size-constrained decision trees (SCDT), which avoids the pruning phase by considering accuracy and simplification during the building phase. By placing a size constraint on the maximal number of leaves, the algorithm controls the complexity of the tree. The SCDT also applies a selection process using data clustering. This selection process considers the information gain if a node is split to form two branches. The SCDT splitting process ends when it reaches either the predetermined maximal number of leaves or when it hits a threshold for the number of data items in a node. The authors tested their SCDT approach against the traditional decision tree classifiers such as C4.5 and others. On the ten datasets chosen from the Irvine Machine Learning Repository, the accuracy of their SCDT decision trees is comparable to or slightly better than that of trees produced with other approaches.

This paper provides an alternative way of building a binary decision tree without a subsequent pruning phase. By limiting the maximal number of tree leaves, it controls the complexity of the resulting binary decision tree even with large-volume high-dimensional data. Researchers and practitioners in the fields of data-driven decision making and artificial intelligence should benefit from reading this paper. The SCDT has some advantages, although its performance improvement is not yet very significant. Who knows whether this is the silver bullet?

Reviewer:  Chenyi Hu Review #: CR144958 (1702-0140)
Bookmark and Share
  Featured Reviewer  
 
Nonnumerical Algorithms And Problems (F.2.2 )
 
 
Data Mining (H.2.8 ... )
 
 
Record Classification (H.3.2 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Nonnumerical Algorithms And Problems": Date
Improving the performance guarantee for approximate graph coloring
Wigderson A. Journal of the ACM 30(4): 729-735, 1983. Type: Article
Feb 1 1985
Fast algorithms constructing minimal subalgebras, congruences, and ideals in a finite algebra
Demel J., Demlová M., Koubek V. Theoretical Computer Science 36(2-3): 203-216, 1985. Type: Article
Jan 1 1986
Complexity of the word problem for commutative semigroups of fixed dimension
Huynh D. Acta Informatica 22(4): 421-432, 1985. Type: Article
May 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy