A colleague recently gave me a copy of an interesting article:
Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching (Sergey Melnik, Erhard Rahm)
In a nutshell, it outlines a method of taking two arbitrary schemas or graphs (think SQL DDL, RDF datasets, or XML schemas) and matching them together to simplify data integration. They have supportable results that about 50% (on average) of the schema matching task can be automated with no understanding of the semantics of the underlying models.
To sum up their algorithm, they take an initial set of mappings between the two graphs that’s based on something simple and easy (e.g. string prefix and suffix matching on node names) and then propagate that similarity through the network. The algorithm’s assumption is that “whenever any two elements in models G1 and G2 are found to be similar, the similarity of their adjacent elements increases”.
This is an interesting algorithmic approach to schema matching. One of the things you see again and again in the data integration space is the use of semi-automated techniques, e.g. an approach where it is assumed from the start that humans will go behind the computer and fix mistakes, annotate with additional information, and so on.