“Unpacking” implicit data values

One of the areas where missing data is most often made explicit is in “unpacking” data values.

Recently, I’ve been working with a system called Trio, which integrates uncertainty information and lineage with data into a database. The idea is that you can express fuzzier tuples in a relation. Instead of saying “Mary saw a Toyota”, you can assign it 60% confidence, or even add an alternative – either Mary saw a Toyota, or Mary saw a Honda (but not both).

Trio separates each of the data items in a table into two groups; those that are certain, and those that are uncertain. Unsurprisingly, uncertain data items may have a confidence stored along with them. The more interesting case is what gets put into the “certain” category, and why. As far as I can tell, the “certain” table has no formal definition, it just is the set of data items that don’t need a confidence associated with them. So in fact we’re “packing” several different use cases into that designation of a data item as “certain”:

  • Honest-to-god certain, meaning that it has a confidence rating of 100%
  • Certainty information is unknown, but assumed to be certain
  • Certainty information doesn’t apply

Many people will reasonably point out that it’s OK for there to be nulls, and for them to have some special meaning, it’s just important that their meaning be consistent. When the meaning of nulls can’t be consistent (because the semantics of the domain require that nulls differ in meaning based on context) – then you have a missing data problem. The common approach is then to go “unpack” those null values and enumerate their actual meanings so that they can be stored alongside the data in the future.

Other background – see also the caBIG “Missing Value Reasons” paper, and the flavors of null discussion.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s