Ontology Game: Humans Matching Concepts

A new “ontology game” has recently been announced, as a “game with a purpose” to help get humans to categorize objects properly according to a formal ontology.

How it Works

The game operates in a way that’s similar to Google’s image tagging application; pairs of users who do not know one another are presented with the abstract from a Wikipedia page, and they have to choose categories in an upper ontology that accurately describe the article. (E.g. does it correspond to an abstract concept? An agent? A happening?) Users get points when both users choose the same answer to categorize an article. As the game goes on, the categorization gets more and more specific until it “bottoms out” in the upper ontology. At that point, you jump to a new article and start the process over again.

Gameplay

In terms of gameplay, it feels a little bit rough in part because the game doesn’t choose the articles very intelligently. (In one case, I got the same article twice in a row) Also, after you tag 5-6 different articles, the player has a good working knowledge of the taxonomy of the upper ontology, and it becomes less fun as the game devolves into categorization along lines you’ve seen many times before. The key difference here from Google’s image tagging game is that in Google’s game, people enter free-form words, so your input is almost limitless. Oh, and one other thing – in order to categorize properly, you have to read the 2-3 sentence descriptions of what the categories mean, which can take some time the first time around when you have 6-7 categories to choose from.

These don’t appear to me though to be fatal problems for the game, just teething problems. It could be fun if the data set was widened substantially, and the category choice perhaps narrowed a bit. And of course in the background, they’re building an interesting data set mapping Wikipedia articles to high-level concepts of what they represent.

Background

Here’s the original announcement email from Martin Hepp at DERI

We are proud to release the first one in our series of online computer
games that turn core tasks of weaving the Semantic Web into challenging
and interesting entertainment – check it out today athttp://www.ontogame.org/

A very early paper written in late Summer is in Springer LNCS Vol.
4806, 2007, pp. 1222-1232 [1].

A complete Technical Report including our quantitative evidence and video footage will be released shortly on our project Web page at http://www.ontogame.org/

The next series of games for other tasks of building the Semantic Web is already in the pipeline, so please stay tuned :-)

Please subscribe to our OntoGame mailing list if you want to be informed once new gaming scenarios or results are available. See [2] for details on how to subscribe.

What is it good for?
====================
Despite significant advancement in technology and tools, building ontologies, annotating data, and aligning multiple ontologies remain tasks that highly depend on human intelligence, both as a source of domain expertise and for making conceptual choices. This means that people need to contribute time, and sometimes other resources, to this endeavor.

As a novel solution, we have proposed to masquerade core tasks of weaving the Semantic Web behind on-line, multi-player game scenarios, in order to create proper incentives for humans to contribute. Doing so, we adopt the findings from the already famous “games with a purpose” by von Ahn, who has shown that presenting a useful task, which requires human intelligence, in the form of an on-line game can motivate a large amount of people to work heavily on this task, and this for free.

Since our first experiments in May 2007, we have gained preliminary evidence that (1) users are willing to dedicate a lot of time to those games, (2) are able to produce high-quality conceptual choices, and, by doing so, (3) can unknowingly weave the Semantic Web.

Acknowledgments: OntoGame is possible only thanks to the hard work of the OntoGame team – special thanks to Michael Waltl, Werner Huber, Andreas Klotz, Roberta Hart-Hiller, and David Peer for their dedication and continuous contributions! The work on OntoGame has been funded in part by the Austrian BMVIT/FFG under the FIT-IT Semantic Systems project myOntology (grant no. 812515/9284), http://www.myontology.org, which we gratefully acknowledge.

And now…. play and enjoy!

Best wishes

Martin Hepp and Katharina Siorpaes

“Unpacking” implicit data values

One of the areas where missing data is most often made explicit is in “unpacking” data values.

Recently, I’ve been working with a system called Trio, which integrates uncertainty information and lineage with data into a database. The idea is that you can express fuzzier tuples in a relation. Instead of saying “Mary saw a Toyota”, you can assign it 60% confidence, or even add an alternative – either Mary saw a Toyota, or Mary saw a Honda (but not both).

Trio separates each of the data items in a table into two groups; those that are certain, and those that are uncertain. Unsurprisingly, uncertain data items may have a confidence stored along with them. The more interesting case is what gets put into the “certain” category, and why. As far as I can tell, the “certain” table has no formal definition, it just is the set of data items that don’t need a confidence associated with them. So in fact we’re “packing” several different use cases into that designation of a data item as “certain”:

  • Honest-to-god certain, meaning that it has a confidence rating of 100%
  • Certainty information is unknown, but assumed to be certain
  • Certainty information doesn’t apply

Many people will reasonably point out that it’s OK for there to be nulls, and for them to have some special meaning, it’s just important that their meaning be consistent. When the meaning of nulls can’t be consistent (because the semantics of the domain require that nulls differ in meaning based on context) – then you have a missing data problem. The common approach is then to go “unpack” those null values and enumerate their actual meanings so that they can be stored alongside the data in the future.

Other background – see also the caBIG “Missing Value Reasons” paper, and the flavors of null discussion.

Ontology Matching Book

Ontology Matching

Euzenat and Shvaiko’s book is devoted to ontology matching as a solution to the semantic heterogeneity problem faced by computer systems. Ontology matching finds correspondences between semantically related entities of different ontologies. These correspondences may stand for equivalence as well as other relations, such as consequence, subsumption, or disjointness between ontology entities. Many different matching solutions have been proposed so far from various viewpoints, e.g., databases, information systems, and artificial intelligence.

With Ontology Matching, researchers and practitioners will find a reference book that presents currently available work in a uniform framework. In particular, the presented work and techniques can equally be applied to database schema matching, catalog integration, XML schema matching and other related problems. The book presents the state of the art and the latest research results in ontology matching by providing a detailed account of matching techniques and matching systems in a systematic way from theoretical, practical and application perspectives.

Missing Data and Causal Chains

One of my colleagues today suggested an interesting way of looking at the problem of missing data. He referred back to a lot of work on process modeling, where people in essence try to “reverse engineer” existing business processes.

Let’s say you discover a process, and you are able to identify steps 1, 2, and 4 of the process. Obviously the analyst knows that there was a step 3 somewhere, and the name of the game from that point becomes locating and describing step 3 – the missing data – of the process.

More broadly, steps in a process or information that is gathered are part of some causal chain. If you can identify the causal chain that the missing data belongs to, you at least have a framework for understanding how the missing data relates to other observations, and a starting point for asking the question “what does it mean that this information is missing?”  This causal chain might be thought of as the context surrounding the missing data.

Judea Pearl and Causality

Judea Pearl, who is one of the more influential authors on knowledge representation, causal reasoning, and AI wrote a book in 2000 titled “Causality: Models, Reasoning, and Inference“.  (Dr. Pearl is also as it happens Daniel Pearl’s father)

On his page that describes why he wrote the book, he relates an interesting anecdote about the scientific community’s avoidance of discussion about causality, and how that is a problem. His book was intended to help “students of statistics who wonder why instructors are reluctant to discuss causality in class; and students of epidemiology who wonder why simple concepts such as ‘confounding’ are so terribly complex when expressed mathematically”. His summation:

“Causality is not mystical or metaphysical.
It can be understood in terms of simple processes,
and it can be expressed in a friendly mathematical
language, ready for computer analysis.”

Comparing Upper Ontology Definitions of “Information Resource”

Copia has an interesting post about comparing the definitions of “information resource” in various upper ontologies.

Just looking at the isolated semantic differences between a simple concept across all of these upper ontologies is a bit of a reality check.  I have read papers that claim that the data interoperability problem will be solved by simply mapping all of the upper ontologies together, and having each domain describe their data in terms of a particular ontology that uses the upper-ontology of their choice.  Looking at the complexity and subtlety of just “information resource” across these different conceptualizations makes that approach look pretty silly.

Trends in usage – data/information that fades from common usage

A very broad category of data that we’ve talked about before is everything that simply fades from common usage because of changes in convention.  This is certain an obvious reason why something might not be there but is in someways the reverse of information that is so commonly used that it is no longer explicitly noted, i.e., evolving standards.

In a subsequent post I’ll talk about a specfic examples such as evolving knowledge and frames of reference in science and the shift in standards for research.