History of an Idea: Missing Data

Entries from September 2007

Missing Data and Causal Chains

September 7, 2007 · Leave a Comment

One of my colleagues today suggested an interesting way of looking at the problem of missing data. He referred back to a lot of work on process modeling, where people in essence try to “reverse engineer” existing business processes.

Let’s say you discover a process, and you are able to identify steps 1, 2, and 4 of the process. Obviously the analyst knows that there was a step 3 somewhere, and the name of the game from that point becomes locating and describing step 3 – the missing data – of the process.

More broadly, steps in a process or information that is gathered are part of some causal chain. If you can identify the causal chain that the missing data belongs to, you at least have a framework for understanding how the missing data relates to other observations, and a starting point for asking the question “what does it mean that this information is missing?”  This causal chain might be thought of as the context surrounding the missing data.

Categories: Context · Semantics

Judea Pearl and Causality

September 6, 2007 · 1 Comment

Judea Pearl, who is one of the more influential authors on knowledge representation, causal reasoning, and AI wrote a book in 2000 titled “Causality: Models, Reasoning, and Inference“.  (Dr. Pearl is also as it happens Daniel Pearl’s father)

On his page that describes why he wrote the book, he relates an interesting anecdote about the scientific community’s avoidance of discussion about causality, and how that is a problem. His book was intended to help “students of statistics who wonder why instructors are reluctant to discuss causality in class; and students of epidemiology who wonder why simple concepts such as ‘confounding’ are so terribly complex when expressed mathematically”. His summation:

“Causality is not mystical or metaphysical.
It can be understood in terms of simple processes,
and it can be expressed in a friendly mathematical
language, ready for computer analysis.”

Categories: Links · Semantics

Comparing Upper Ontology Definitions of “Information Resource”

September 4, 2007 · Leave a Comment

Copia has an interesting post about comparing the definitions of “information resource” in various upper ontologies.

Just looking at the isolated semantic differences between a simple concept across all of these upper ontologies is a bit of a reality check.  I have read papers that claim that the data interoperability problem will be solved by simply mapping all of the upper ontologies together, and having each domain describe their data in terms of a particular ontology that uses the upper-ontology of their choice.  Looking at the complexity and subtlety of just “information resource” across these different conceptualizations makes that approach look pretty silly.

Categories: Data integration · Ontology · Semantics