Grouping or clustering elements based on their relative similarities is a principal step in data analysis, with bioinformatics being one of the applications. Similarity can be defined by a number that measures the strength of the relation, leading to the well-studied k-center, k-sum, and k-median problems. Alternatively, the similarity of a pair of elements can be given as a label “+” (similar) or “-” (dissimilar); such a qualitative notion of similarity is considered in this paper. When cast as an optimization problem for graphs, two versions are of interest. MaxAgree tries to maximize the sum of the number of pluses within the clusters, and minuses between clusters. MinDisAgree is interested in minimizing the sum of minuses within clusters, and pluses between them.
This paper follows the study spearheaded by Bansal et al. [1], and answers several questions that provide a good insight into approximability of clustering with qualitative information, both for complete and for general graphs. Among the results for general graphs (that is, where similarity is known only between some pairs), the authors provide a factor O(log n) approximation for MinDisAgree, as well as an approximable (APX) hardness and a factor 0.766 approximation for MaxAgree.
In addition to shedding new light on inapproximability, the paper demonstrates the use of a number of algorithmic techniques for designing approximation algorithms. Among them are general techniques of LP relaxation, semidefinite relaxation, and, more specifically, region growing for minimum multicut and reductions to 3-satisfiability (3SAT). Altogether, thanks to its comprehensive discussion, this paper represents a valuable addition to the literature for students and researchers interested in approximation and in clustering in particular.