Notes on Questions about missing data

  • Which data is missing?
    • Usually, this should be smack-you-in-the-face obvious.  Something that is supposed to be there isn’t there.
    • Knowing that something is missing presumes that you knew it was supposed to be there.
    • “Lies of omission, and incomplete truths” – sometimes information that is missing isn’t just a blank, it’s the absence of precision or detail.
    • Context determines what should be there.  The producer’s context and the consumer’s context may not match, and the fact that something is missing is evidence for this context mismatch.
  • Where is it missing from?
    • What is the scope? Are we dealing with web pages, structured databases, PDF files, or the entire universe of data?
    • Do particular forms of data (structured, unstructured) have particular characteristics that lend themselves  to analysis of missing information questions?
    • Should the issue of things that are missing be narrowed to data (at the detriment of “information”)?
  • Why is the data missing?
    • An exhaustive taxonomy of missing value reasons is likely impossible, if you accept that the number of contexts is unlimited.
    • Still, a taxonomy may be able to generalize reasons into buckets and cover vast swaths of the “reason space” explaining why something is missing.
    • What level of analysis is most important?  Is it that an individual value in an individual observation is missing?  Is it that all values for a particular field are missing?  How about data global absence (i.e. it’s not an individual data field, it’s the whole data asset)
  • Why do we care that the data is missing?
    • What valuable contextual information would the missing data have provided?
    • Are we interested in drawing an in-model conclusion (e.g. what the value should be, or how that missingness impacts other values)?
    • Are we interested in drawing an out-model conclusion (e.g. where the conclusion’s impact is completely outside the data set where something was missing)?
  • Given the above three questions, what conclusions can we draw?
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s