Förderjahr 2017 / Stipendien Call #12 / ProjektID: 2418 / Projekt: Decentralised Data Provenance based on the Blockchain
In the past blog entries, we introduced in a rather general way the topic. We gave a brief overview of provenance and what it could be used for and we also introduced some problems it has that could be solved with help of the blockchain. Today we will have a more detailed look at how provenance information actually looks like.
The World Wide Web Consortium (W3C) published in 2013 the PROV family of documents for how to achieve inter-operable interchange of provenance information in heterogeneous environments such as the Web. This family of documents is a collection of W3C Recommendations and Notes defining a provenance model, notations, extension points, etc. These documents build on top of the research of previous years and make extensive use of results from earlier scientific projects and advancements like the open provenance model and the proof markup language.
Such a formal definition of how to represent provenance information is needed to allow automated exchange and processing of provenance information. If every service that wants to save provenance information would define their own model and notation, different actors and services could never work together to produce a complete picture of the recorded provenance.
At its core, PROV defines the following model elements depicted as boxes.
- Entity: An entity represents a thing of which we want to describe the provenance. An entity can be physical, digital, conceptional or any other kind of thing.
- Activity: An activity is some action that happens over time and works on or with entities. Activities can create, consume, transform, etc. entities.
- Agent: An agent is a someone or something that takes responsibility for actions taking place. They can be related to entities, activities or other agents. An agent itself can be an entity or activity thus allowing the model to take provenance also over an agent.
These elements have a set of relationships to each other depicted as connections between the boxes.
- WasGeneratedBy: This represents the generation of an entity by an activity.
- Used: This marks the beginning of an activity utilizing an entity.
- WasInformedBy: This represents the communication between two activities. More specific one activity is using an entity generated by the other activity.
- WasDerivedFrom: A derivation represents the transformation of an entity into another.
- WasAttributedTo: This represents the attribution of an entity to an actor. Like for example a blog post to its author.
- WasAssociatedWith: This represents the assignment of responsibility to an actor for an activity.
- ActedOnBehalfOf: This represents the assignment of authority and responsibility from one actor to another for a specific activity. The agent that is acted on behalf keeps a part of the responsibility for the activity.
Beside this shortly presented core structures of the PROV model, it also contains extended structures adding more details and expressive abilities to the provenance model. The PROV documents also discuss different notations and how to transform PROV model-based data into other important notation forms. One of the simplest human-readable ways to display provenance information is as a graph. In the following example, we show the authors view on the provenance of one of the PROV documents. Depicted in orange are actors, in blue activities and in yellow entities.
We can see that the document was generated by the edit1 activity which was associated with two actors one being the contributor and the other having the role of the editor.
However, the graphical representation is not meant to represent all the details in the model. An alternative notation of the same provenance as defined by the PROV-N document could look like this:
- entity(tr:WD-prov-dm-20111215, [ prov:type="document", ex:version="2" ])
- activity(ex:edit1, [ prov:type="editing" ])
- wasGeneratedBy(tr:WD-prov-dm-20111215, ex:edit1, -)
- agent(ex:Paolo, [ prov:type='prov:Person' ])
- agent(ex:Simon, [ prov:type='prov:Person' ])
- wasAssociatedWith(ex:edit1, ex:Paolo, -, [ prov:role="editor" ])
- wasAssociatedWith(ex:edit1, ex:Simon, -, [ prov:role="contributor" ])
Going into the details of the notation is outside of the scope of this blog entry and the above example is meant only to provide a feeling for the more complex and complete representation of provenance information as defined by PROV.
In our next blog post, we will have a look at how to find provenance information even over multiple domains and stakeholders.
Svetoslav Videnov
My master thesis aims to combine the advantages of the blockchain with data provenance. The blockchain is a distributed ledger which allows persisting data in an unchangeable way. Data provenance is an approach to track what happened to data and by this allowing to build trust into this data.