ComputingReviews.com

Using regression makes extraction of shared variation in multiple datasets easy
Korpela J., Henelius A., Ahonen L., Klami A., Puolamäki K. Data Mining and Knowledge Discovery30(5):1112-1133,2016.Type:Article

Date Reviewed: 11/23/16

This interesting paper presents an application that could be of value to individuals working in data analysis of sets, trying to find commonalities among what appears to be unrelated data. The idea behind the derivation of shared variation is meaningful because it shows how some aspect of two or more sets of data is similar. In the first chart in the paper, the authors show how applying this method to the price changes over time of different commodities can derive a common trend line for the prices of natural gas and oil, iron ore and nickel, and beef and lamb. This analysis can show if there are similar common variations in the sets of data for the categories, allowing one to use the commonalities for manufacturing, marketing, investment, and so on.

The authors clearly describe what they are going to do and then do it. They present an example of their method for deriving shared variation, show how their method fits with existing methods, show that their method is an extension of redundancy analysis to more than two datasets, and analyze various datasets and compare the output of their method to existing methods for finding shared variation.

The difference between this (what the authors call the COCOREG algorithm) and other methods is the use of chains of multiple pairwise regression calculators, which filter data and when done pass only shared variation values between datasets. The method works well with multiple sets of data and gives good results.

The authors provide a thorough logical derivation of the method and experimental data showing their results from several different sets and types of data. The results of the analyses are displayed in the various figures near the end of the paper. These results clearly show the capability of this method to extract shared similarities from various kinds of data in most cases more accurate than other methods.

Although not something necessarily of use to the vast majority of computer users, for someone doing analysis on multiple sets of data trying to find commonalities, this process could be extremely valuable.

Reviewer: Michael Moorman

Review #: CR144941 (1702-0147)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy