Information retrieval (IR) has been a useful computing tool for more than three decades. This 13-page journal paper describes some of the changes in IR associated with using IR from microblogs, such as, for instance, from the tweets generated by the users of the social media application Twitter.
This paper is not a comprehensive report, but an early report covering some key problems encountered in the IR microblog experience. After an introduction to IR, the paper surveys seven observed microblog IR problems: sentiment analysis, opinion mining, entity search, user-generated metadata, authority, influence, and temporal issues. The paper closes with brief coverage of six outstanding problems: geographic data, data needs, data queries, search relevance, search recency, and search corpus abundance.
Fortunately, IR access to microblog documents is very quick and easy. Millions of such documents are readily accessible for IR. While individually the contents of the documents are short--less than 150 characters per document--the documents often have some distinctive attributes. Some are explicit, such as hashtags. Prime among the implicit attributes are fast responses about the contents that the microblogs cover. For example, The Wall Street Journal reported that a microblog document announcing the August 2011 earthquake in Virginia was received in New York City before the tremors from that earthquake were felt in New York City [1].
Unfortunately, some commonly desired attributes (for IR purposes) of documents are not always readily ascertainable from microblog documents. Two common examples are the identification of the author of a document, and the identification of the intended receiver of a document.
The paper leaves for future evaluation some IR matters, such as geography and security. Another example is content timing, when IR-relevant content is sometimes covered in many microblogs by many authors, or only in a few by a few. An example is the Haiti earthquake. Also, IR access to microblogs can be deliberately used as a tool, such as in the Brazilian work on monitoring dengue fever outbreaks in South America [2].
This paper supplies some helpful pointers for the effective use of IR from microblogs. Even though the paper is an early report, the timing attribute could have been introduced earlier and covered more fully. Overall, the paper provides stimulating reading about a modern use of IR.