Computing Reviews

Decision tree induction with a constrained number of leaf nodes
Wu C., Chen Y., Liu Y., Yang X. Applied Intelligence45(3):673-685,2016.Type:Article
Date Reviewed: 12/06/16

Decision trees are broadly applied in machine learning, especially for data-driven decision making. Training with a subset of data, one may build a decision tree to classify other data items. The concept of the decision tree is not hard at all to understand. However, there are practical challenges when applying decision trees on large-size high-dimensional data because straightforward approaches result in very complicated or impractical trees. Hence, there are various approaches to simplify a decision tree and to improve its accuracy. Most of these techniques include a pruning phase after the building phase.

The authors of this paper describe an approach they call size-constrained decision trees (SCDT), which avoids the pruning phase by considering accuracy and simplification during the building phase. By placing a size constraint on the maximal number of leaves, the algorithm controls the complexity of the tree. The SCDT also applies a selection process using data clustering. This selection process considers the information gain if a node is split to form two branches. The SCDT splitting process ends when it reaches either the predetermined maximal number of leaves or when it hits a threshold for the number of data items in a node. The authors tested their SCDT approach against the traditional decision tree classifiers such as C4.5 and others. On the ten datasets chosen from the Irvine Machine Learning Repository, the accuracy of their SCDT decision trees is comparable to or slightly better than that of trees produced with other approaches.

This paper provides an alternative way of building a binary decision tree without a subsequent pruning phase. By limiting the maximal number of tree leaves, it controls the complexity of the resulting binary decision tree even with large-volume high-dimensional data. Researchers and practitioners in the fields of data-driven decision making and artificial intelligence should benefit from reading this paper. The SCDT has some advantages, although its performance improvement is not yet very significant. Who knows whether this is the silver bullet?

Reviewer:  Chenyi Hu Review #: CR144958 (1702-0140)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy