The Causes of Correlation (Pournelle March 2013)

Posted by admin on April 12, 2013 in from Jerry Pournelle's blog |
Share Button
Our small discussion of correlation and causation yesterday impelled Mike Flynn, who thinks a lot about this sort of thing – he’s a quality control expert, which means he is very much concerned with advanced studies in statistical inference – to write a short essay. It may tell you more about causation and correlation than you really wanted to know, but those who actually have to deal with such matters ought to know this sort of thing at this level:

The Causes of Correlation

Regarding recent comments on correlation and causation, a few observations:

1. If X causes Y, then X and Y will be correlated IF a wide enough range of X is examined. Otherwise, it is possible for X and Y to appear uncorrelated.

2. If Y causes X, there will likewise be a correlation, if a wide enough range of Y is examined; but researchers may be fooled into supposing that it is X that causes Y.

Example: in the famous case of the Storks of Oldenburg, an excellent correlation obtained between the population of Oldenburg, Germany, during the 1930s and the number of storks observed each year. Do storks bring babies? No, babies bring storks: as the town grew, more houses were built, resulting in more chimneys, and the European stork likes to build its nest in chimneys. So, more nesting places.

3. If Z causes both X and Y, there will be a correlation between X and Y even though there is no causal connection whatsoever.

Example: in a chemical reaction low process yields (Y) was correlated with high pressure in the vessel (X). The suggestion to increase yields by lowering the pressure was met with scorn because: there was an impurity in the raw material (Z) that interfered with the reaction and lowered yields AND also caused frothing in the vessel. The standard operation procedure instructed the operator to combat frothing by increasing the pressure to hold down the foam. So low yields and high pressure were associated, but manipulating one would not change the other. Both were effects, not causes in this context.

4. If X and Y are both on a trend or cycle during the same time period, the respective time series will correlate even if there is no causal connection.

Examples:

* Columbia river salmon runs go up and down in roughly eleven year cycles. So do sunspots on the sun. Do sunspots cause salmon? Do salmon cause sunspots? Is there a lurking Z that makes salmon eager to spawn AND causes the sun to boil?

* An example I used to use in training classes. The % of women participating in the labor force (X) has been increasing smoothly since the 1880s. The % of foreign cars sold domestically (Y) was increasing from 1955 to 1990. The correlation between X and Y was in the high 90% range. Does this mean that we can save Detroit by getting women back in the kitchens? Or only that two trends will always correlate?

* If global temperatures are increasing and atmospheric CO2 is increasing during the same time frame, they will correlate.

5. Coincidence. There was a longstanding correlation between the size of the universe and the size of my suits. Space was expanding, and so was I. But if I lost weight, would the universe begin to contract? Hemlines and stock prices is another classic example.

Example: Science Can Tell If You’re A Racist Just By Looking At You http://wmbriggs.com/blog/?p=7407

6. The Unabomber Effect in Multiple Correlation. When the Unabomber taught math at Berkeley he said that given seven independent variables (X1,…, X7) you can fit any finite set of data (Y). It’s only a matter of finding the right coefficients. (It might not survive new data; but then you simply re-analyze and come up with a new set of coefficients and, presto, you get another fit.) This could become an enormous problem with Big Data and automated data mining and adjustment.

Actually it is already an enormous problem with Big Data and automated data mining. But you know that. Statistical dragnets can find a lot of interesting correlations. Treat them like hypotheses to be tested and you may learn something. And every now and then an unexpected correlation does lead to some real discoveries, which is why keeping careful case histories is so important to medicine.

Copyright © 2013-2024 Hofman.org All rights reserved.
This site is using the Desk Mess Mirrored theme, v2.5, from BuyNowShop.com.