Computing Reviews, the leading online review service for computing literature.

Search

Embedded analytics and statistics for big data
Louridas P., Ebert C. IEEE Software30 (6):33-39,2013.Type:Article

Date Reviewed: Aug 19 2014

If there is a topic that has been overhyped, it is big data [1]. A confluence of trends has made it possible to start working with previously unimaginable data in all its volume, velocity, and variety. For most, we don’t actually have big data, but we still benefit from the new approaches. The analytics that Louridas and Ebert are highlighting is the blending role of a data scientist, part programmer and part mathematician or statistician, and the supporting tools. Louridas and Ebert do a great job framing the challenge the world faces with the ever-increasing amount of data, the need to make it meaningful, and the emergent skills and tools required to do so. Concrete examples using data from the World Bank are provided. They essentially propose an analytics stack: Data-Driven Documents (D3), Python, and R, which matches my experience in practice. Python offers a real programming environment, and R is a true statistical package that combines programming and visualization into a working environment. There are a number of high-quality routines available for R. Finally, D3, used for the visualization portion of stack, generates something an end user can consume. These three aspects, programming, statistics, and visualization, are critical to a complete data analytics platform. One challenge is that many of the tools that deal with really big data such as distributed file systems and managed execution, like Apache Hadoop, are not written to effectively work with the numerous tools available. Still, it is likely that Hadoop would be part of the overall solution (the mother ship of offline analytics). Unfortunately, most examples deal with flat files when in fact most systems of any complexity are working with databases, feeds, and files, and at different velocities (for example, end-of-day feed to real-time sensors). Really big data requires a foundational infrastructure to tackle the largest scale. Since most of us do not actually have big data, this may be moot. What is not controversial are the points Louridas and Ebert make. One spends as much time preparing data as analyzing it, and optimization is still likely to come from the software engineering side. All I would add is to make sure you have data scientists leading the analytics [2].

Reviewer: Brian D. Goodman	Review #: CR142628 (1411-0973)

1)	Sicular, S. Big data is falling into the trough of disillusionment. Gartner blog. http://blogs.gartner.com/svetlana-sicular/big-data-is-falling-into-the-trough-of-disillusionment/ (accessed 07/31/2014).

2)	Davenport, T.; Patil, D. J. Data scientist: the sexiest job of the 21st century. Harvard Business Review (Oct. 2012). http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/.

Real-Time Systems And Embedded Systems (D.4.7 ... )

General (H.2.0 )

General (D.3.0 )

Would you recommend this review?

yes

Other reviews under "Real-Time Systems And Embedded Systems":	Date

Real-time software techniques Heath W., Van Nostrand Reinhold Co., New York, NY, 1991. Type: Book (9780442003050)	Aug 1 1991

Developing safety systems Pyle I., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (9780132042987)	Jul 1 1992

Real-time systems with transputers Zedan H. Real-time systems with transputers,York, UK,Sep 18-20, 1990,1990. Type: Whole Proceedings	Apr 1 1992

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy