透過您的圖書館登入
IP:3.144.25.74
  • 期刊

Efficiently Comparing Provenance for Knowledge Discovery

並列摘要


Provenance is a record that describes entities and processes involved in producing, delivering and influencing a resource. Provenance management and reuse can enable interesting applications for knowledge discovery and analytics. One crucial component of a provenance management system is the comparison between provenances. In the era of big data, provenance management systems are in need of a scalable algorithmic solution for efficient comparison. Existing solutions to the problem have large memory footprint and require overlong system response time. In this paper, we present a new solution to threshold-based provenance comparison.It models provenance directly as graphs, and proposes to measure their similarities using provenance edit distance. We first provide analytic results regarding the expected search space of the existing and the proposed solution. On top of the depth-first search paradigm, we design an algorithm PEDSim using an encoding technique specific to provenance graphs and quantifiable heuristics. Extensive experiments on real data demonstrate the superiority of our method to other alternatives.

延伸閱讀