Category Archives: Context

Ontology Game: Humans Matching Concepts

A new “ontology game” has recently been announced, as a “game with a purpose” to help get humans to categorize objects properly according to a formal ontology.

How it Works

The game operates in a way that’s similar to Google’s image tagging application; pairs of users who do not know one another are presented with the abstract from a Wikipedia page, and they have to choose categories in an upper ontology that accurately describe the article. (E.g. does it correspond to an abstract concept? An agent? A happening?) Users get points when both users choose the same answer to categorize an article. As the game goes on, the categorization gets more and more specific until it “bottoms out” in the upper ontology. At that point, you jump to a new article and start the process over again.

Gameplay

In terms of gameplay, it feels a little bit rough in part because the game doesn’t choose the articles very intelligently. (In one case, I got the same article twice in a row) Also, after you tag 5-6 different articles, the player has a good working knowledge of the taxonomy of the upper ontology, and it becomes less fun as the game devolves into categorization along lines you’ve seen many times before. The key difference here from Google’s image tagging game is that in Google’s game, people enter free-form words, so your input is almost limitless. Oh, and one other thing – in order to categorize properly, you have to read the 2-3 sentence descriptions of what the categories mean, which can take some time the first time around when you have 6-7 categories to choose from.

These don’t appear to me though to be fatal problems for the game, just teething problems. It could be fun if the data set was widened substantially, and the category choice perhaps narrowed a bit. And of course in the background, they’re building an interesting data set mapping Wikipedia articles to high-level concepts of what they represent.

Background

Here’s the original announcement email from Martin Hepp at DERI

We are proud to release the first one in our series of online computer
games that turn core tasks of weaving the Semantic Web into challenging
and interesting entertainment – check it out today athttp://www.ontogame.org/

A very early paper written in late Summer is in Springer LNCS Vol.
4806, 2007, pp. 1222-1232 [1].

A complete Technical Report including our quantitative evidence and video footage will be released shortly on our project Web page at http://www.ontogame.org/

The next series of games for other tasks of building the Semantic Web is already in the pipeline, so please stay tuned 🙂

Please subscribe to our OntoGame mailing list if you want to be informed once new gaming scenarios or results are available. See [2] for details on how to subscribe.

What is it good for?
====================
Despite significant advancement in technology and tools, building ontologies, annotating data, and aligning multiple ontologies remain tasks that highly depend on human intelligence, both as a source of domain expertise and for making conceptual choices. This means that people need to contribute time, and sometimes other resources, to this endeavor.

As a novel solution, we have proposed to masquerade core tasks of weaving the Semantic Web behind on-line, multi-player game scenarios, in order to create proper incentives for humans to contribute. Doing so, we adopt the findings from the already famous “games with a purpose” by von Ahn, who has shown that presenting a useful task, which requires human intelligence, in the form of an on-line game can motivate a large amount of people to work heavily on this task, and this for free.

Since our first experiments in May 2007, we have gained preliminary evidence that (1) users are willing to dedicate a lot of time to those games, (2) are able to produce high-quality conceptual choices, and, by doing so, (3) can unknowingly weave the Semantic Web.

Acknowledgments: OntoGame is possible only thanks to the hard work of the OntoGame team – special thanks to Michael Waltl, Werner Huber, Andreas Klotz, Roberta Hart-Hiller, and David Peer for their dedication and continuous contributions! The work on OntoGame has been funded in part by the Austrian BMVIT/FFG under the FIT-IT Semantic Systems project myOntology (grant no. 812515/9284), http://www.myontology.org, which we gratefully acknowledge.

And now…. play and enjoy!

Best wishes

Martin Hepp and Katharina Siorpaes

Advertisements

“Unpacking” implicit data values

One of the areas where missing data is most often made explicit is in “unpacking” data values.

Recently, I’ve been working with a system called Trio, which integrates uncertainty information and lineage with data into a database. The idea is that you can express fuzzier tuples in a relation. Instead of saying “Mary saw a Toyota”, you can assign it 60% confidence, or even add an alternative – either Mary saw a Toyota, or Mary saw a Honda (but not both).

Trio separates each of the data items in a table into two groups; those that are certain, and those that are uncertain. Unsurprisingly, uncertain data items may have a confidence stored along with them. The more interesting case is what gets put into the “certain” category, and why. As far as I can tell, the “certain” table has no formal definition, it just is the set of data items that don’t need a confidence associated with them. So in fact we’re “packing” several different use cases into that designation of a data item as “certain”:

  • Honest-to-god certain, meaning that it has a confidence rating of 100%
  • Certainty information is unknown, but assumed to be certain
  • Certainty information doesn’t apply

Many people will reasonably point out that it’s OK for there to be nulls, and for them to have some special meaning, it’s just important that their meaning be consistent. When the meaning of nulls can’t be consistent (because the semantics of the domain require that nulls differ in meaning based on context) – then you have a missing data problem. The common approach is then to go “unpack” those null values and enumerate their actual meanings so that they can be stored alongside the data in the future.

Other background – see also the caBIG “Missing Value Reasons” paper, and the flavors of null discussion.

Ontology Matching Book

Ontology Matching

Euzenat and Shvaiko’s book is devoted to ontology matching as a solution to the semantic heterogeneity problem faced by computer systems. Ontology matching finds correspondences between semantically related entities of different ontologies. These correspondences may stand for equivalence as well as other relations, such as consequence, subsumption, or disjointness between ontology entities. Many different matching solutions have been proposed so far from various viewpoints, e.g., databases, information systems, and artificial intelligence.

With Ontology Matching, researchers and practitioners will find a reference book that presents currently available work in a uniform framework. In particular, the presented work and techniques can equally be applied to database schema matching, catalog integration, XML schema matching and other related problems. The book presents the state of the art and the latest research results in ontology matching by providing a detailed account of matching techniques and matching systems in a systematic way from theoretical, practical and application perspectives.

Missing Data and Causal Chains

One of my colleagues today suggested an interesting way of looking at the problem of missing data. He referred back to a lot of work on process modeling, where people in essence try to “reverse engineer” existing business processes.

Let’s say you discover a process, and you are able to identify steps 1, 2, and 4 of the process. Obviously the analyst knows that there was a step 3 somewhere, and the name of the game from that point becomes locating and describing step 3 – the missing data – of the process.

More broadly, steps in a process or information that is gathered are part of some causal chain. If you can identify the causal chain that the missing data belongs to, you at least have a framework for understanding how the missing data relates to other observations, and a starting point for asking the question “what does it mean that this information is missing?”  This causal chain might be thought of as the context surrounding the missing data.

TANSTAAFL: The No-Free-Lunch Theorem

I came across this interesting tidbit while reading one of Numenta’s papers on their HTM approach.

The No-Free-Lunch Theorem:   “no learning algorithm has an inherent advantage over another learning algorithm for all classes of problems.  What matters is the set of assumptions an algorithm exploits in order to learn the world it is trying to model.”

 Ho, Y. C. & Pepyne, D. L. (2002), “Simple Explanation of the No-Free-Lunch Theorem and Its Implications”, Journal of Optimization Theory and Applications V115(3), 549-570.

Or, as Robert Heinlein put it a while back, TANSTAAFL.  There Ain’t No Such Thing As A Free Lunch.

ebXML’s Context Table

ebXML’s context table outlines the different classes of context that the ebXML model recognizes.  “Context” in this case is business context – the set of circumstances around the transmission of business process related information.

ebXML Context

Click the image for a larger-sized picture of what a context representation in ebXML might look like, and where it fits in.

The things that are in this context model are obviously skewed towards electronic interchange of business information (e.g. a high-level context type of “Geopolitical/Regulatory”) but they do have a number of types that could be reinterpreted in much broader ways, for example “Partner Role”, “Info Structural Context”, and everybody’s favorite – “Temporal”.

This is interesting because it’s a practical implementation of context that bites the bullet and categorizes all possible contexts as a fixed number of types.

Annotate anything, anywhere

If you are interested in annotations on ontologies or any other sort of resource, check out the W3C Project Annotea. They are working towards making it possible to load annotations alongside any web resource.  Imagine going to a web page, and clicking on a toolbar button to reveal annotations made by any number of people across the world.  I assume some other mechanisms would be necessary to make sure the annotations, bookmarks, and marginalia were relevant to you, but you get the idea.

They have built an annotation schema for describing the RDF structure of those annotations, and some basic software for creating annotations and storing them on an annotation server.  The mozilla project is hosting Annozilla, an attempt to integrate Annotea into mozilla and firefox.

Each annotation can then become a discussion thread of its own.  Imagine reading a web page on history, and noticing that someone else has posted an annotation that suggests a linkage to some other event that happened in history.  You would then be able to reply to that annotation, ask a question, or dispute its accuracy.

In these kinds of discussion, the discussion’s context would be extremely localized.  An entire thread of discussion might take place about a particular phrase or passage in a document.  Such discussions have always taken place, the difference is that with stored annotations, there is explicit traceability back to the source data that provides “context” for the discussion.  Of course you’ll always have issues such as when the discussion topic drifts.  Still, tracking the seed idea or piece of information that started the discourse has its own benefits.