Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Evaluating the effects of missing values and mixed data types on social sequence clustering using t-SNE visualization
Lazar A., Jin L., Spurlock C., Wu K., Sim A., Todd A. Journal of Data and Information Quality11 (2):1-22,2019.Type:Article
Date Reviewed: Apr 21 2021

The authors discuss how one can compensate for missing values when clustering joint categorical sequences. For example, one might have a set of sequences with both nominal values, such as family size, and binary values, for example, marriage status. Usually such sequences will exhibit missing values; however, discarding these sequences will result in too few sequences to allow valid conclusions. Some missing values can be inferred--for example, age--but often this is not the case.

The authors consider various ways to address missing values. The principal method used is the idea of an edit distance, which measures the number and size of the changes required to edit one sequence to another. In the case where the entries in the sequence consist of multiple values, one can use the average of the edit distances for each category. Given a distance, one can proceed to infer a set of clusters of “like” sequences.

Using a study of income dynamics as an exemplar exhibiting both binary and nominal values, the authors provide a detailed description of the process. They give experimental results comparing various choices for distance. The income dynamics sequences provide sufficient variety to allow several measures for distance.

By using dimension reduction techniques, the authors are able to provide visual representations of the clusters corresponding to the distance choices. The paper is a useful guide to the available techniques, with sufficient illustrative examples so that readers can apply the ideas.

Reviewer:  J. P. E. Hodgson Review #: CR147245 (2108-0218)
Bookmark and Share
  Featured Reviewer  
 
Learning (I.2.6 )
 
 
General (I.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy