Provenance Reconstruction 1: Version Controlled Documents

The ground truth ( groundtruth.ttl) for the first dataset was generated from a number of github repositories using the Git2PROV (http://git2prov.org) tool. As raw data, you receive every version of each file that was ever present in the repository (including deleted files). However, the filenames are randomized, to simulate a scenario where all provenance was lost. Also be warned that due to the randomized filenames, the timing metadata associated with the files may differ from the original. The correct timings can be found in the ground truth provenance (see the prov:atTime property of the qualified generations).

Data and Resources

Additional Info

Field Value
Author Paul Groth, Tom De Nies, Robin Verborgh, Sarah Magliacane
Last Updated June 13, 2014, 11:20 (UTC)
Created June 13, 2014, 11:13 (UTC)
year 2014
comments powered by Disqus
comments powered by Disqus