Protein sequence analysis is a technique for discovering the structures and functions of proteins in living organisms using functions like comparison of sequences, identification of intrinsic features, sequence differences and variations, and molecular structures; it assists in revealing evolution and genetic diversity. The authors present a computationally efficient method to reveal the protein functionality of sequences using aligning and clustering patterns.
The method, the aligned pattern (AP) synthesis process, is made up of three steps: pattern discovery, AP clustering, and AP cluster refinement. Step one finds nonredundant statistically significant associations of amino acids. It uses a fast and space-efficient algorithm using statistical conditions as confidence thresholds to restrict the patterns discovered as statistically significant and nonredundant. Step two groups and aligns sequence patterns, and synthesizes the patterns into AP clusters. The authors employ the global Needleman-Wunsch alignment algorithm and the local Smith-Waterman alignment algorithm for merging two AP clusters into one using hierarchical clustering. Finally, step three refines the AP clusters into weak AP clusters and then to conserved AP clusters. This improves the sequence coverage while maintaining cluster entropy.
The authors conduct in silico tests using three biological datasets: cytochrome c, ubiquitin, and triosephosphate isomerase (TIM). For consistency, they use, “the minimum occurrence for each pattern set as half of the total number of sequences multiplied by the percentage of identity and coverage.” They observe cytochrome c, ubiquitin, and TIM results showing AP clusters correlating "to binding sites that richly represent ... binding segments as patterns and ... binding residues as aligned columns," hierarchical clustering performance and AP cluster quality, and biological significance, respectively.
This is an interesting read about protein analysis; it clearly profiles the nature of the protein functionality domain. The authors claim to develop a unique and novel AP synthesis process.