Computing Reviews, the leading online review service for computing literature.

Search

Scalable access to scientific data
Menasce D. IEEE Internet Computing9 (3):94-96,2005.Type:Article

Date Reviewed: Jan 25 2006

The quantity of data stored online quadruples every 18 months. In order to transform this data into information and knowledge, it is essential to develop effective data analysis and extraction capabilities. This clearly written paper presents (using a textual representation on page 94 that is substantially better and semantically richer than the pictorial one in Figure 1) a typical environment in which data is collected, stored, and validated at “ground stations” that send this data to “data product generation centers which generate higher level data products.” The generation cycle of these centers consists of a data product generation phase and a quiet period, during which new data accumulates. The paper does not refer to the semantics of data product generation, or to the semantics of relationships between data, despite the clear need for data collections and data products to be structured, since end users request data products of interest. A very simple analytical model (based on processor and input/output (I/O) subsystem utilization) is presented to answer such questions as: What is the maximum data product retrieval rate? What is the average data product retrieval time for a given data product retrieval rate? What is the impact of quiet period duration on the data product retrieval time? Some numerical results shown using this model appear to be in agreement with qualitative considerations. The model and the results are straightforward, and it appears that they handle only some, and not the most interesting (or important), aspects of scalability, because, first, data selection of any kind ought to be based on semantic considerations (not addressed in the paper), and, second, concepts such as parallelism or indexing are not mentioned at all. Finally, the author emphasizes the need to provide semantically rich ontologies associated with metadata, in order to solve the problem of semantic information integration across different scientific domains, due, in particular, to different terminologies. This important problem, and ways to solve it, has been well known for decades; no references are provided in the paper. In summary, the paper may be a good illustration of Hayek’s observation that, in the use of statistics, we “either deliberately ignore or are ignorant of the relations between the individual elements with different attributes,” while, in dealing with complexity, “it is precisely the[se] relations that matter” [1].

Reviewer: H. I. Kilov	Review #: CR132359 (0609-0959)

1)	Hayek, F.A. The critical approach to science and technology (In honor of Karl R. Popper). The Free Press of Glencoe, , 1964.

Abstracting Methods (H.3.1 ... )

Data Manipulation Languages (DML) (H.2.3 ... )

Data Models (H.2.1 ... )

Web-Based Services (H.3.5 ... )

Information Search And Retrieval (H.3.3 )

Languages (H.2.3 )

Would you recommend this review?

yes

Other reviews under "Abstracting Methods":	Date

Hahn U., Reimer U.Type: Article	Aug 1 1985

The synthesis of specialty narratives from co-citation clusters Small H. Journal of the American Society for Information Science 37(3): 97-110, 1986. Type: Article	Apr 1 1987

The possible effect of abstracting guidelines on retrieval performance of free-text searching Fidel R. (ed) Information Processing and Management: an International Journal 22(4): 309-316, 1986. Type: Article	Mar 1 1987

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy