Each Excel spreadsheet is designed to process 17 billion data (17 thousand columns x 1 million rows). And, with the possibility of adding one thousand sheets, each Excel document is capable of processing over 17,179,869,184,000 data. Certainly is an unthinkable amount of data, which distance us from the “I don’t have enough space” problem, but face us with a new question, “what do I do with all it?”
A first answer, maybe the most interesting, could be, “let’s play”. Let’s use this tool to relate data which would not be related in another way. That is how some unnoticed correlations emerged, like the correlation between underpant sales and economic crisis; the curious correlation between US oil imports and chicken consumption; or, my favorite, the close correlation between the nationality of the Nobel Prize winners and chocolate consumption.
Till now nobody had ventured into these topics, I assume mainly for “pragmatic” reasons rather than for “theoretical” reasons. Formerly, when we said, “it is hard to process data”, we referred fundamentally to a practical problem, since hundreds of papers, pens, space and mainly people were needed (not only to record data, but to operate on them – sum, subtract, divide, apply functions, etc.). Today, that problem is gone. With the new technologies, the technical issue has been overcome. Now the difficulty lies in the theory, and the great problem to address is about determining the relevant -useful- knowledge we can obtain from so many data.
The Causality Problem
The entertaining correlations that are appearing bring with them an old debate called the causality problem. Even though the discussions around causality are broad, the use of data tools harden the conflict between correlation and causality, and in the academic and everyday discussions its increasingly heard the inexorable distinction: Correlation is not Casualty.
To put it in simple term, the correlation refers to two events that happen simultaneously, while causality involves something more: it indicates that one of this events is caused/generated by the other. Therefore, when we analyse the correlation between A and B (for example, the rooster’s crow and the sunrise), initially we don’t know if (1) A causes B (the sunrise causes the rooster’s crow), if (2) B causes A (the rooster’s crow makes the sun rise) or if (3) there is a third factor D (the Earth position) which is causing both A and B (the Earth position makes the rooster crow and the sun rise). In other words, we know that A and B behave in syntony, but we don’t know if A→B, B→A, o if C→A and C→B.
Even though it looks like a pure theoretical-philosophical problem, it truly has great consequences for the application of knowledge. Following with the example, if anybody wanted a cold day and believed that the correlation between the rooster’s crow and the sunrise was due to the causality from the rooster’s crow to the sunrise (the roosters crow→ the sun rises), he would surely be trying to find a way to silence every rooster in town. Or, if in the great correlation between the chocolate consumption and the Nobel Prize somebody saw causality, he would possibly start including chocolate in his daily diet (just in case, I have already started).
Ups, I suspect that in these trivial examples the relevance of the problem is not well appreciated , so lets analyze a topic which we worry more about: the unbearable phenomenon called inflation.
Emission and Inflation.
Although it may be usual to us, since long time ago Argentina lives under a fairly atypical economic phenomenon: every month the prices rise. The curious situation is that even though there is consensus about inflation being a problem, there is a great discussion (or maybe just in Argentina), about its origin: is it due to monetary emission, market concentration, evilness of the businessmen or the Holy Spirit? In this theoretical discussion, there is an undeniable empirical aspect: the emission and the inflation show a very strong statistical correlation. However, since correlation does not mean causality, those who argue that the high levels of emission are causing high inflation levels are accused of neo-liberal Orthodox monetarists (very popular insults in Argentina).
Leaving aside the weirdness of the language, I would like to defend here that, at least in regarding inflation, ***the debate around causality has more relevance in gnoseological terms than in pragmatic terms. ***This means that even though we could spend a lifetime discussing about the causality problem, in this case like many others, it is not necessary to solve the causality problem to resolve the inflation issue. That’s why I want to insist that inflation is a problem already solved – here is the global evidence, and here the Chilean case. In few word, and as odd as it may seem, what I’m saying is that even though the causality is an unsolved theoretical problem, to reduce the inflation it is necessary to decrease the emission levels.
Correlation vs Causality: smoking and lung cancer.
As a final thought, I would like to share something that happened in a distant city, during the 50s, when it was still in discussion if the cigarette caused lung cancer. I hope it contributes to combat a disease the Argentine economy is suffering.
For a long time, epidemiologists observed a great correlation between the cigarette consumption and the lung cancer. However, they were under the famous issue, “correlation is not causality”.
*Many were convinced that tobacco caused lung cancer, but others supported the inverse causality: when suffering lung cancer, people smoked more (maybe to calm their desperation). *
A third group argued there was a common cause: the correlation between lung cancer and smoking didn’t imply causality in any direction; there was rather a greater cause -distress combined with anxiety- which caused lung cancer and desire to smoke on people with the same characteristics.
Certainly to determine causal correlations is strictly a difficult -if not impossible- problem to solve. In the meantime, a group of doctors -ignorant (or tired) of the philosophical entanglements-, recommended their patients to lower the tobacco dosis.