With the accelerating adoption of smartphones in the last decade, mobile applications have become increasingly popular as well, with app stores recording billions of downloads. It is of immense interest to phone manufacturers, carriers, and application developers to understand users and usage behaviors, as such knowledge can influence the design of optimal equipment and networking infrastructure in addition to attractive software.
Research is still ongoing on this topic, and the underlying patterns are still elusive and not fully recognized. The authors here are attempting to address this challenge by offering new methods and insights, as well as valuable suggestions.
According to the authors, the research community so far has been making a simplified assumption when considering users as characterized by a small number of types. Apparently, such a formulation could limit the reproducibility of results learned by different user studies. The authors here suggest a more detailed categorization of users that can potentially improve predictability by deriving more reliable conclusions.
The paper is well written and well organized. The introduction contains a good explanation of the background, and the references provide adequate pointers to the relevant literature. Most of the paper describes how the authors have gleaned information out of raw usage data collected in a particular month. Some interesting ideas are floated here in the textual narration rather than being formally presented in algorithmic form. However, these suggestions are worthy of further experimentation by other practitioners and researchers.
The dataset here consists of records of user ID, time stamp, and recent tasks utilized. In a preprocessing step, the apps used are extracted from this data and assigned a weight. This step also does filtering, and only active users are considered with some outliers discarded.
Because there are thousands of apps, they are grouped into 29 semantic categories. Given these, along with each day divided into four parts, and considering the holidays and weekdays as separate, each user is represented by a vector of 232 dimensions.
Clustering works on this user data and makes use of a hybrid method that combines the speed of k-means with the more natural scheme of MeanShift to determine 382 clusters, which represent the user categories. This formula>k-means-MeanShift hybrid method is a novel contribution by the authors.
Most of the clusters contain between 100 and 300 users, and the biggest contains 4,981 users. Because 232 features are still too many to characterize a group, the authors have used a selection method based on ranking the general and idiosyncratic features. Results are shown for three large clusters and three small clusters.
The biggest cluster is that of night communicators that use phone and SMS past midnight more often. Their usage of the clocking apps in the morning, shopping apps in the afternoon, and music apps on holiday evenings is relatively less common.
The second biggest cluster is that of screen checkers that wake up the phone, and perhaps check the time and any notifications, but do not unlock the screen.
The smallest cluster reported has 113 users, and they use financial apps more often. They use stock apps more often in the morning on holidays and on weekday mornings and afternoons. They more often use navigation apps during workday mornings, puzzle games on holiday afternoons, and weather-related apps on holiday mornings.
For more interesting correlations like these on evening learners, young parents, and car lovers, readers can refer to the full paper. Researchers, phone designers, and application developers will find these details interesting.Overall, this is an impressive paper that is innovative, timely, and thought provoking.