Quick Summarization of Data Mining Approaches to Handling Missing Data

Data Mining Research Blog – Handling Missing Values

A few observations about these approaches, which illustrate the predispositions of data mining:

  • They don’t seem to care about what the value of the missing data is, they primarily care about the missing data’s impact on the value of the particular data observation (or row)
  • For expediency, they tend to assume that missing values will be statistically distributed similar to how the non-missing (or observed) values are distributed
  • There is a focus on a large corpus of observations; the impact of the individual observation is small.

These are all reasonable constraints given what data mining is doing.  As I discover this kind of thing though, I’m trying to keep it documented because these types of themes would probably be interesting to contrast with an approach that was aimed at using missing values as an information channel.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s