A database is a warehouse of information; its contents can be anything from bank account details for millions of customers to the large historical weather-related data used for analysis. There are several database management solutions, developed to provide high volume data processing capability, scalability, and fault tolerance and to meet application-specific requirements. With the emerging Internet of Things (IoT), driverless cars, smart sensors for home security, and many other real-time tracking technologies, the past few years have seen a huge surge in data flow. Furthermore, the storing, processing, and querying of high-speed data streams presents challenges to the available database management engines.
This paper introduces a novel database system, ChronicleDB, and describes “a storage layout tailored [to support] high write performance under fluctuating data rates and powerful indexing capabilities to support a variety of queries.” It is an extension of previous research . This extension introduces “user-defined aggregates, a continuous query processing component, and an extended load scheduling approach [that ensures] high ingestion rates under [variable] data rates.” A method for managing high numbers of out-of-order event arrivals, which may occur due to network communication delay, is also introduced.
This research clearly aims to improve storage utilization and write performance and provide an economical solution suitable for embedded storage solutions within a monolithic system. A thorough study, analysis, and context-based comparison with readily available databases is included. It would be worthwhile for readers to have some prior knowledge of what other databases offer in terms of features.
The overall flow of the paper is well organized. Section 2 starts with related work and solutions, followed by section 3 on system architecture. Section 4 details storage layout and section 5 covers an indexing approach. Section 6 introduces a load scheduler and section 7 adds recovery methods as a part of the extended work. Section 8 mainly focuses on event stream processing flow. Finally, sections 9 and 10 provide an evaluation and conclusion.
Benchmarking is performed with the closest competitor’s database solutions using a variety of real events-based datasets. Captured results clearly show that the proposed solution meets the research objective.
Densely written, the paper requires in-depth reading to grasp the complete idea. While the presented design and approach is computer language neutral, readers should be well versed in the computer data structure field in order to take full advantage of this research.