Saturday, February 13, 2021

Causation and Correlation: confusion or simplification

 

     We often encounter statements in the media of the nature "X causes Y". "G is the cause of Q". "You need to stop H or Q will happen."

     Of course, the rejoinder often is "She did H and Q never happened." or "I have Xed all of my life and Y has never entered into the picture."

     This isn't to say that X, G, or H are not things that are detrimental to health, or the economy, or the environment, or whatever other category it may be applied to. It is just a matter that, as is true of many other situations applied to health, politics, religion, and so forth the use of the word "cause" is not accurate.

     Simplification is not, in itself, a bad thing. Simplification can help in describing something for a wide audience with widely different backgrounds and experiences. Simplification should be recognized as such and not be considered "the whole truth". As mentioned in a prior blog, the Body Mass Index (BMI) number is a good analysis of healthy weight for about 80% of the population. And, as long as the 20% is not penalized via use of the BMI, it is not a bad thing. Unfortunately, easily obtained numbers often get misused -- on purpose or out of lack of effort.

     As mentioned in the previous blog, if you take two large, truly random, groups of people then you can statistically compare the groups when one group has something done and the "control group" has not. This group had H. The control group did not have H. Five percent more of the group with H developed Q compared to the group. Is five percent a significant difference? It depends on group sizes and confidence in the comparability of the two groups. Beyond that, the actual number where it becomes "statistically significant" should be done by a statistician (which I am not).

     At this point, perhaps it is determined that it IS statistically significant. The media, in their desire to inform but strong tendency to oversimplify, will shout "H causes Q!" But what about all those other people in either group that did not develop Q?

     If I hit an unshielded thumb, placed on a concrete slab with no protections (note the careful listing of conditions) and hit it hard with a hammer, it will cause pain and it will cause an injury (of varying seriousness depending on additional factors). In this situation, since it will happen to anybody who is in this scenario, we can say "hitting an unprotected thumb on a concrete slab with a hammer causes pain and injury". No exceptions, if you do A it causes B.

     Note, there can always be something that is not specified that might make this not true. What if the thumb was part of a strong prosthetic hand? What if the concrete slab had not set yet and the hand and thumb went into the concrete when the hammer hit? But in the everyday world, you can say that hitting the thumb with a hammer causes pain and injury. You can say that the earth's rotation causes the appearance of the sun coming up over the horizon to the east. In each case, causing is (for all practical purposes) 100% true for the specified situation.

     In these situations it is also very important to be specific. For example, there is the statement that "burning fossil fuels causes an increase in carbon dioxide and other waste products in the atmosphere". Seems obviously true, doesn't it? But, what if the factory had a carbon dioxide "scrubber" on the stack and a reclamation chamber prior to release of particulates to the atmosphere? So, a much more accurate (and something that can be addressed in more ways) statement is "burning fossil fuels without adequate filtering and reclamation causes an increase in carbon dioxide and other waste products in the atmosphere". It isn't the burning of the fossil fuels that is the foundational problem -- it is the non-cleansed release of the  exhaust products. Possible solutions increase. We can change fuels or we can install filtration and reclamation remedies.

     In the situation where the group with H had Q develop 5% more often than the group without H, it is NOT true that "H causes Q" no matter how much the media, or others, want to simplify the situation. What we have here is a correlation. A correlation can be insignificant (below the threshold of what the statisticians consider to be significant), significant (at threshold), moderately significant, or strongly significant between H and Q developing. But it cannot accurately be said that "H causes Q".

     And, the inaccurate use of cause in saying "H causes Q" reduces credibility of the message.

Interrupt Driven: Design and Alternatives

       It should not be surprising that there are many aspects of computer architecture which mirror how humans think and behave. Humans des...