One of my colleagues today suggested an interesting way of looking at the problem of missing data. He referred back to a lot of work on process modeling, where people in essence try to “reverse engineer” existing business processes.
Let’s say you discover a process, and you are able to identify steps 1, 2, and 4 of the process. Obviously the analyst knows that there was a step 3 somewhere, and the name of the game from that point becomes locating and describing step 3 – the missing data – of the process.
More broadly, steps in a process or information that is gathered are part of some causal chain. If you can identify the causal chain that the missing data belongs to, you at least have a framework for understanding how the missing data relates to other observations, and a starting point for asking the question “what does it mean that this information is missing?” This causal chain might be thought of as the context surrounding the missing data.