This book is an output from the German Research Foundation’s priority programme SPP 1736 on “Algorithms for Big Data.” SPP 1736 funded 15 projects, and a few projects with their own funding were also associated with the programme. There are eight papers under the broad heading “Algorithms for Large and Complex Networks,” and six under the broad heading “Algorithms for Big Data and Their Applications.”
As might be expected, the papers differ substantially in the breadth of their applicability. One of potentially wider applicability is Albers’ paper on energy-efficient scheduling. We assume that a processor running at speed s takes energy sα (typically α=2,3) and ask how best to schedule jobs, particularly in an online setting where we don’t know the future load. However, the author points out that the offline setting is more relevant than one might think, as it covers predicting the future from the past. There are variants with heterogenous processors, and where machines can be powered down (but powering up costs energy).
Some papers are software-oriented, for example, Angriman et al.’s paper on the NetworKit toolkit for large-scale network analysis. There’s also Giesen et al.’s paper “The GENO Software Stack”--GENO (generic optimization) is a domain-specific language (DSL) for mathematical optimization. One component is autoBLAS, a compiler that translates formal linear algebra expressions into optimized BLAS calls--again, this might have wider applicability.
In their paper “Scalable Cryptography,” Hofheinz and Kiltz raise an interesting question relevant to big data. With plausible parameters, PKCS is secure against 280 attacks. However, if we assume 230 users (many fewer than the number of mobile phones) and 230 ciphertexts, our provable security drops to 220--trivial. This team’s work is the first identity-based encryption (IBE) scheme whose security properties do not degrade in the number of ciphertexts.
There are also papers on genome assembly, scalable text indices, and much else.
This book covers a wide range of topics in big data research. If I were running a master’s program in big data, I would use this book as a source for dissertations. It’s hard to envisage anyone (except perhaps a starting PhD student wanting to get a feel for the range of big data research) reading the entire book, but the individual papers will have their own readerships.