Provenance is information about entities, activities, and people involved in producing a piece of data or a thing, which can be used to assess its quality, reliability or trustworthiness. This paper focuses on a new approach using the standard PROV model recommended by the World Wide Web Consortium (W3C) to model provenance. The W3C PROV model defines a core model for provenance representation. Individuals involved in the semantic web, provenance, and ontology field will want to study this work.
The first part of the paper provides an intuitive overview of the W3C PROV model with an example involving a complete account of PROV relations with three types of instances: entities, activities, and agents.
In PROV, physical, digital, conceptual, or other kinds of things are called entities. [...] Activities are how entities come into existence and how their attributes change to become new entities. [...] An agent takes a role in an activity such that the agent can be assigned some degree of responsibility for the activity taking place. [1]
In addition, the validity of the provenance statements is defined with reference to a set of constraints that the statements must satisfy. For instance, when two entities use the predicate prov:wasDerivedFrom, it implies that the first entity precedes the second one.The second part of the paper presents a number of applications that use the PROV model to capture provenance information. In Dictionary, the PROV model asserts the membership of a word in a dictionary and records the change (insertion and removal) history of the words. In Scientific Workflows, the PROV model captures information about the data products used and generated by the steps that compose the workflows. As a result, people can easily debug workflows and reproduce the workflow results. In Executable Documents, the PROV model captures the provenance of each research object to trace its evolution over time. In Smart Cities, the PROV model records the provenance information about citizens and their contributions to assist in the verification of collected data.
The paper would have been more complete if the authors had provided a deeper analysis of how the applications apply the PROV model in terms of provenance modeling and querying.