Live Web content, such as blogs, really simple syndication (RSS) feeds, and real-time news, increasingly consumes large amounts of network bandwidth. In addition, consumers (users) often do not have an effective mechanism to filter what they receive once they subscribe to a service.
The authors of this paper try to tackle this problem in the area of RSS. They propose a stateful full-text dissemination system using a distributed hash table (DHT)-based peer-to-peer (P2P) architecture, and evaluate the system using real datasets. The proposed system proves to “significantly reduce the publishing cost with low maintenance overhead and a high document quality.”
The proposed system works as follows. A number of brokers are deployed between the end users and the live Web content providers. Users subscribe to the content service in the same way as they did without these brokers; however, now both the subscriptions and the content pushed out by the providers go through the brokers. When a user subscribes to a service, he or she has to also submit a set of keywords to use to filter the content. The brokers select the content to forward to the user from the service providers based on some similarity measure between the set of keywords and the content. To reduce the Internet protocol (IP) address look-up cost, a DHT-based P2P service is employed. DHT-based P2P systems have “high scalability, fault tolerance, and efficient routing.”
The authors prove both analytically and experimentally that the system works very effectively. For example, for a system with a user-selected term set as 2 and the threshold of relevance set as 0.01, the selection rate is about 0.5 percent (the lower the rate is, the more selective the process is); the average number of hops the messages have to travel is about 1.1, while the quality of disseminated information is very high--both the precision and recall rates are over 90 percent.