Follow by Email

Thursday, October 4, 2012

Causation, Correlation, and Job Creation

I had an obvious thought just now, and then decided to google it just after writing the title, as opposed to after writing the post.   I see some people on Straightdope have made this point, but I think it bears repeating here and elsewhere.

I didn't say it so many words in my previous posts, but it's worth pointing out that the "rich people as job creators" myth is not just empirically silly, it's also an example of the correlation/causation fallacy.  While it's true that people who create a lot of jobs are typically rich (or on their way to becoming rich), that does NOT mean that any given rich person is likely to be a job creator.

Much of Mitt Romney's platform is based on the fallacy that giving MORE money to rich people will result in more jobs being created.

The only reason that Republicans are in this race at all is because they appeal to rich people, who can spend a lot of money to try to persuade the uneducated and unwashed that they know best how to solve the problem.

But look at it this way.  If you had a billion dollars and wanted it to create jobs, who would you give it to?

(1) Paris Hilton, with no strings attached


(2) A government agency given the specific task of using the money to create useful, lasting jobs.

I admit I cringed as I wrote (2) above -- I don't think the government has done a particularly good job at creating jobs.  But I think we (even the uneducated unwashed) can agree that the government would do a better job than Paris Hilton.

UPDATE 10/03/12

Just saw the following headline in Slate:

The Internet Blowhard’s Favorite Phrase

Why do people love to say that correlation does not imply causation?

So does that make me an "internet blowhard," or is the fact that I just used the phrase above another example of the correlation/causation fallacy?

The article is by Daniel Engber, and I've gone through it a couple of times and am not sure I get his point.  The short story is that he seems to acknowledge the prevalence and problems with the fallacy, but he calls people who point it out "blowhards" and seems to advocate that we stop using the "catch-phrase" altogether.  Go figure.

He starts off on a confusing note by citing a study which apparently found that depressed college students send more email and IMs and do more file-sharing than do non-depressed college students.  In other words, the study found a correlation.  He doesn't say that the study somehow concluded that these kinds of internet activity caused depression, or that depression caused these kinds of internet activity.  If he had, that would have explained his next sentence, which was that "Not everyone found the news believable."

But so far, the news is just that there was a correlation.  From his report, I don't see any indication that anyone disbelieved the correlation.  One has to click on his links to see what the article reporting the study actually said, and what the commenters were reacting to.  In short, it was this:

"The researchers recommend using these primary findings to further identify correlations between Internet usage and other mental health disorders including anorexia, bulimia, ADHD and schizophrenia. The study’s authors hope to use the findings to apply future software applications that could warn Internet users if they are displaying depression symptoms online."

Commenters were reacting to what they thought was an assumption that an increase in certain web behaviors that are correlated with depression actually justifies warning people that they might be depressed.  Perhaps the commenters were over-reacting a little bit -- as Engber later points out, correlations often are useful indicators.  [as an aside, I generally agree with the commenters that the fallacy is at work here, but I also think the software is silly for other, even more important, reasons]

So he started off on a bad note by not fully explaining his example.  He then goes on with stuff like this:

"So how did a stats-class admonition become so misused and so widespread? What made this simple caveat—a warning not to fall too hard for correlation coefficients—into a coup de grace for second-rate debates?"

He never provides data for his assumption that the admonition has become "so misused."  So far, he's only given us one example, and in that example, the internet blowhards citing the fallacy were simply pointing out the obvious fact that just because a person sends more email, that doesn't mean he's depressed.  Although this doesn't finally answer the question of whether or not the software will be useful, it's perfectly fair to point it out -- I don't think it's a "misuse" of the admonition.  But in any event, the reader of Engber's article can't judge whether or not it's misused, because he hasn't taken the trouble to explain how they were using it.

He then goes on to report on the origins of the phrase, and the fact that it has becoming significantly more widespread in the "computer" age.

I am slowly beginning to think that this article is satire -- he seems to be linking correlation after correlation to causation after causation.  Let's take a closer look:

"Those first, modest peaks of 'correlation is not causation' show up in print in the 1890s—a date that happens to coincide with the discovery of correlation itself.  That's when the British statistician Karl Pearson introduced a powerful idea in math: that a relationship between two variables could be characterized according to its strength and expressed in numbers."

The second sentence more or less refutes the first one here.  It's not that "correlation" was "discovered" in the 1890s, it's just that someone figured out a way to use math to show the strength of a correlation.  But it's probably true that Pearson's work paved the way for studies that pointed to correlations, and made assumptions about causation, which might have caused others to point to the correlation/causation fallacy.  Engber seems to acknowledge this: "As correlations split and multiplied, we needed to remind ourselves of what they meant and what they didn't."

He then provides an interesting if somewhat pointless graph showing that usage of correlation has increased since 1890, whereas "causation" has stayed relatively constant:


Not a surprising graph given that the increase in usage started when the math was worked out.  The increase just shows that the concept caught on, and entered the popular vocabulary in a way that "causation" has not. If you think about it, people today use the word "correlation" a lot more than they use the word "causation."  (As an aside, he invites "someone else [to] explain why correlations have been trending downward since 1976."  I can't answer that definitively here, but I note that there has been an explosion of texts, as well as new words, since 1976.  This is apparently just measuring the percentage of the time that any given word in a book (on average) happens to be "correlation."  If the overall vocabulary is increasing, the percentage usage of any given word will go down.  Likewise, it might be explained by a differential growth of types of literature that are less likely to use the word "correlation" than the types of literature that were sampled for 1976.)

He then points out that "in the present day, . . . Google, Amazon, and the other data juggernauts belch smoggy clouds of information and spit out correlations by the ton" and quotes someone as saying "To them [Amazon, etc.], perhaps, automated number-crunching stands for the highest form of knowledge that civilization has ever produced."  And then Engber speculates:  "In that sense, the admonitory slogan about correlation and causation isn't so much a comment posted on the Internet as a comment posted about the Internet. It's a tiny fist raised in protest against Big Data."

Hard to tell where he wants to go with this; he seems to have become so enthralled by his metaphors that he forgot to link them to his thesis.  I would guess that people who point to the fallacy are typically pointing out a misuse of "correlations," as I am doing above.  I'm not trying to shake my tiny fist at big data.  It's not a movement, it's just an occasional observation, made in reaction to bad logic (or, if misused, in reaction to perceived bad logic).

Engber then veers off to say that there are other limits on the utility of statistics as well -- e.g. that the term "statistical significance" is arbitrarily set at 5% (i.e. a result is considered "significant" if the chance that it occurred randomly is less than 5%).  BTW, this reminds me of a good cartoon on the subject:

That's (since people who read this blog might not be as mathematically-inclined as those who read xkcd, the point is if there's a 5% chance of coincidence, and you test something 20 times, chances are that the coincidence will occur in one of your tests).

Back to Engber -- he now wonders why people don't point out this problem with "significance" as often as they point out the problem with correlations:

"'Don't confuse statistical and substantive significance!'" That comment-ready slogan would be just as much a conversation-stopper as correlation does not imply causation, yet people rarely say it. The spurious correlation stands apart from all the other foibles of statistics. It's the only one that's gone mainstream. Why?"

The answer is pretty obvious -- people abuse causation/correlation much more often, and more blatantly, than they "abuse" statistical significance.  

More rumination and the conclusion from Engber:

"I wonder if it has to do with what the foible represents. When we mistake correlation for causation, we find a cause that isn't there. Once upon a time, perhaps, these sorts of errors—false positives—were not so bad at all. If you ate a berry and got sick, you'd have been wise to imbue your data with some meaning. (Better safe than sorry.) Same goes for a red-hot coal: one touch and you've got all the correlations that you need. When the world is strange and scary, when nature bullies and confounds us, it's far worse to miss a link than it is to make one up. A false negative yields the greatest risk.

"Now conditions are reversed. We're the bullies over nature and less afraid of poison berries. When we make a claim about causation, it's not so we can hide out from the world but so we can intervene in it. A false positive means approving drugs that have no effect, or imposing regulations that make no difference, or wasting money in schemes to limit unemployment. As science grows more powerful and government more technocratic, the stakes of correlation—of counterfeit relationships and bogus findings—grow ever larger. The false positive is now more onerous than it's ever been. And all we have to fight it is a catchphrase."

So he has meandered back to the basic point -- that there currently IS a lot of abuse of correlation/causation going on, in all manner of discourse. and that this IS a problem.  If that's the case, why is he so against people trying to point it out?!

His final words -- "And all we have to fight it is a catchphrase" -- are baffling.  If overuse of correlations is a problem, and our only weapon is this catchphrase, why is he preaching unilateral disarmament?!  Why does he "correlate" use of this phrase with "Internet blowharded[ness]"?

It might be our only weapon against all these false correlations, but it's a pretty effective one -- it reminds people, in language that nearly everyone understands, that a correlation -- such as the fact that people who create jobs are typically rich -- says nothing about causation.  The state of being rich does not cause one to create jobs, and giving more money to the rich (by lowering their taxes) will not cause the rich to create more jobs.  What's odd is what I observed when I started this post -- that so few people on the internet had pointed out that the "job creator" myth is yet another example of the causation/correlation fallacy, even though that's so obviously what it is.

If you're one of those internet blowhards that likes to point out causation/correlation fallacies, keep up the good work!

(And for the record, I'm still afraid of poison berries.)

No comments:

Post a Comment