The knowledge of the functionalities of gene structures is germane to curing a variety of diseases. How should the biological information from open-access repositories be integrated with gene expression data for better understanding of the roles of gene structures? Nepomuceno and colleagues propose an algorithm for integrating biological information from numerous sources to enrich the search for patterns in gene expressions.
A biclustering algorithm is an unproven machine learning procedure for pinpointing the structures of gene expressions in data matrices. The authors present the sums of weighted appropriate volumes, patterns, and qualities of biclusters of genes as a fitness function for discovering organically pertinent biclusters. According to the paper, “A measure based on the analysis of the enrichment of a bicluster is proposed ... to measure the biological relevance of a bicluster. The measure is the proportion or fraction of genes in a bicluster associated with enriched gene ontology (GO) terms.” The average overlap among pairs of genes in a bicluster is used to derive a formula for identifying groups of genes that share parallel organic functionalities in GO.
The concept of a scatter search is to generate an ideal population of solutions from small sets of optimal solutions. The paper describes an iterative algorithm that applies an autonomous scatter search to find each bicluster. The sequential method, together with the bias presented by the fitness function in the search, regulates the non-deterministic feature of this population-based evolutionary metaheuristic.
Is the proposed biclustering algorithm effective in enriching the GO terms of the attained biclusters? Two experiments were performed with two yeast datasets. The annotation file of each dataset is a tree structure of genes that are appropriately linked to GO terms and parent nodes all the way to the root node. One annotation file of 632 genes consists of 245 dissimilar GO terms with an average of 10.6 terms per gene; the other file contains 658 genes with 256 GO terms and an average of 10.1 terms. The planned algorithm was used to enrich different sizes of biclusters and alternative configurations of the fitness function. The experimental results show that, in comparison to the classical biclustering techniques in the literature, the algorithm performs better in integrating biological information and in the quality of search for enriched biclusters. The authors present great insights into the methodology for assimilating biological information from several sources, and for grouping enriched GO terms into metabolism and protein ubiquitination.