LINQS

STATISTICAL RELATIONAL LEARNING GROUP @ UMD



 

A Collective, Probabilistic Approach to Schema Mapping

International Conference on Data Engineering (ICDE) - 2017
Download the publication : kimmig-icde17.pdf [474Ko]  
We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, of choosing the best set of mappings from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures overlap and interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using state-of-the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than ten percent above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

BibTex references

@InProceedings{kimmig:icde17,
  author       = "Kimmig, Angelika and Memory, Alex and Miller, Renee and Getoor, Lise",
  title        = "A Collective, Probabilistic Approach to Schema Mapping",
  booktitle    = "International Conference on Data Engineering (ICDE)",
  year         = "2017",
}

Other publications in the database