Computing Reviews, the leading online review service for computing literature.

Search

Minersoft: software retrieval in grid and cloud computing infrastructures
Dikaiakos M., Katsifodimos A., Pallis G. ACM Transactions on Internet Technology12 (1):1-34,2012.Type:Article

Date Reviewed: Oct 29 2012

Commercial cloud providers such as Amazon and research grids such as Enabling Grids for E-Science (EGEE) are playing an increasingly important role in computing, especially in handling issues with big data. However, with hundreds of cloud and grid servers, and files on the order of millions, it is a challenge for users to find an appropriate piece of software in the cloud or grid in an efficient and effective manner. Researchers are trying to find answers to the challenge. Minersoft, a search engine for software packages distributed across computing clouds and grids, is one such attempt in this direction. Minersoft consists of crawlers that collect software-related information from the clouds, indexers that build inverted indices for search, data storage that stores all information and data for search, a query manager that accepts and processes the queries and returns the results to the user, and a job manager that coordinates the work among different components. Users are able to examine, index, and retrieve upon request software and related documents in various forms, including binary, source code, software libraries, and software description documents. One of the interesting concepts used by the authors is that of a software graph, which is similar to a map of a file system starting from a root. Each of the leaves is a file, and each node in the tree (interior or leaf) contains metadata that helps identify or categorize the node (file). The authors conducted two types of experiments to evaluate the performance of Minersoft. One examines system performance, measuring the number of files a system can index and the time needed to index these files. On a grid, the crawling software is written in Python and it can read an average rate of 100,000 files in five-to-30 minutes. The average rates of the indexing software range from 15-to-65 minutes per 100,000 files. The measurements are on the same order but slower on clouds. The other type of measurement concerns query/answer correctness. The authors used multiple types of measurements, including Precision@10, mean reciprocal rank (MRR), and normalized discounted cumulative gain (NDCG). All three measures show that the system performs very well. The system is very interesting and will be useful to cloud and grid communities. Tools like these can help users locate and identify pieces of software that match their needs.

Reviewer: Xiannong Meng	Review #: CR140632 (1302-0130)

Information Search And Retrieval (H.3.3 )

Cloud Computing (C.2.4 ... )

Content Analysis And Indexing (H.3.1 )

Design Tools and Techniques (D.2.2 )

Would you recommend this review?

yes

Other reviews under "Information Search And Retrieval":	Date

Nested transactions in a combined IRS-DBMS architecture Schek H. (ed) Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings	Nov 1 1985

An integrated fact/document information system for office automation Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article	Oct 1 1985

Access methods for text Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article	Jan 1 1986

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy