Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Minersoft: software retrieval in grid and cloud computing infrastructures
Dikaiakos M., Katsifodimos A., Pallis G. ACM Transactions on Internet Technology12 (1):1-34,2012.Type:Article
Date Reviewed: Oct 29 2012

Commercial cloud providers such as Amazon and research grids such as Enabling Grids for E-Science (EGEE) are playing an increasingly important role in computing, especially in handling issues with big data. However, with hundreds of cloud and grid servers, and files on the order of millions, it is a challenge for users to find an appropriate piece of software in the cloud or grid in an efficient and effective manner. Researchers are trying to find answers to the challenge.

Minersoft, a search engine for software packages distributed across computing clouds and grids, is one such attempt in this direction. Minersoft consists of crawlers that collect software-related information from the clouds, indexers that build inverted indices for search, data storage that stores all information and data for search, a query manager that accepts and processes the queries and returns the results to the user, and a job manager that coordinates the work among different components. Users are able to examine, index, and retrieve upon request software and related documents in various forms, including binary, source code, software libraries, and software description documents.

One of the interesting concepts used by the authors is that of a software graph, which is similar to a map of a file system starting from a root. Each of the leaves is a file, and each node in the tree (interior or leaf) contains metadata that helps identify or categorize the node (file).

The authors conducted two types of experiments to evaluate the performance of Minersoft. One examines system performance, measuring the number of files a system can index and the time needed to index these files. On a grid, the crawling software is written in Python and it can read an average rate of 100,000 files in five-to-30 minutes. The average rates of the indexing software range from 15-to-65 minutes per 100,000 files. The measurements are on the same order but slower on clouds. The other type of measurement concerns query/answer correctness. The authors used multiple types of measurements, including Precision@10, mean reciprocal rank (MRR), and normalized discounted cumulative gain (NDCG). All three measures show that the system performs very well.

The system is very interesting and will be useful to cloud and grid communities. Tools like these can help users locate and identify pieces of software that match their needs.

Reviewer:  Xiannong Meng Review #: CR140632 (1302-0130)
Bookmark and Share
  Featured Reviewer  
 
Information Search And Retrieval (H.3.3 )
 
 
Cloud Computing (C.2.4 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Design Tools and Techniques (D.2.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Information Search And Retrieval": Date
Nested transactions in a combined IRS-DBMS architecture
Schek H. (ed)  Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings
Nov 1 1985
An integrated fact/document information system for office automation
Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article
Oct 1 1985
Access methods for text
Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy