This paper proposes microblogging cube, a new social network analysis (SNA) model for analyzing microblog users and locations according to semantic, geographic, and temporal axes. Compared to the standard online analytical processing (OLAP) technology enforcing rigid summarization in all dimensional hierarchies, this new model applies varying measures depending on the level of microblog user and domain aggregation. With the explosive increase in microblogging data nowadays, this model will help people interactively and semantically view and analyze distributed microblogging data at different granularities. Researchers in business intelligence, SNA, ontology, and data mining will want to study this work.
The proposed model is divided into three parts: the microblogging word cube, the microblogging domain-word cube, and the microblogging domain cube. The microblogging word cube conducts word-centered analysis. Its analysis schema includes six types: microblog word, location, user, time, community, and word. The term frequency-inverse document frequency (TF-IDF) is a numeric measure of how important a word is to a document in a collection, and the normalized Google distance (NGD) method is a semantic similarity measure for a given set of keywords. First, the TF-IDF is used to determine the top-K most representative words for a location or user. Then, the NGD is used to calculate the semantic word distance among the selected words. Based on the word distance, the user distance is determined by weighting each distance measure using user-specified factors. Finally, users are aggregated into different communities by using the user distance. In this part, since each word represents a very specific unit, analysts cannot automatically generate the overall knowledge of a user or location. Thus, the microblogging domain-word cube extends the Open Directory Project (ODP) taxonomy to include a new type (Domain_Word) that defines the domain distribution characterizing each user or location. As a result, SNA is based on both words and domains. The microblogging domain cube conducts domain-centered analysis to provide comprehensive and global visions about the microblog data. In this part, the domain type replaces the word type in the microblogging word cube, and the user communities are extracted based on the domain relations.
Experiments tested the proposed model on a dataset of approximately 14 million tweets and 1,000 relevant users. The reported results reflect the analysis of five experimental users and demonstrate that the proposed approach can do impressive SNA. However, the paper would have been more complete if the authors had provided the experimental results of the remaining 995 experimental users.