Developers often have to deal with data cleaning and data integration tasks. During this work, the database programmer may encounter a situation in which the original intention and the obtained result differ. To highlight the root cause of the problem, a manual search and debugging method is needed.
This paper introduces a method and algorithm to assist data transformation developers in analyzing their designs. The goal is to correct the data transformation procedure and enhance the quality of the result. The paper is highly technical, and the reasoning is based on relational algebra and a related mathematical background that lays the theoretical foundation for the algorithms and methods presented. The proposed algorithm is in pseudocode, with a fine conceptual classification of explanations that provide some clues for the missing data from the query results, as well as a method to discover the cause of defects through examples.
The paper is primarily dedicated to readers with a formal background in relational theory and researchers in similar fields. The implemented algorithm and the approach, which may be transformed into design principles, can have practical relevance for data integration and migration professionals.