I had an obvious thought just now, and then decided to google it just after writing the title, as opposed to after writing the post. I see some people on Straightdope have made this point, but I think it bears repeating here and elsewhere.
I didn't say it so many words in my previous posts, but it's worth pointing out that the "rich people as job creators" myth is not just empirically silly, it's also an example of the correlation/causation fallacy.
While it's true that people who create a lot of jobs are typically rich (or on their way to becoming rich), that does NOT mean that any given rich person is likely to be a job creator.
Much of Mitt Romney's platform is based on the fallacy that giving MORE money to rich people will result in more jobs being created.
The only reason that Republicans are in this race at all is because they appeal to rich people, who can spend a lot of money to try to persuade the uneducated and unwashed that they know best how to solve the problem.
But look at it this way. If you had a billion dollars and wanted it to create jobs, who would you give it to?
(1) Paris Hilton, with no strings attached
(2) A government agency given the specific task of using the money to create useful, lasting jobs.
I admit I cringed as I wrote (2) above -- I don't think the government has done a particularly good job at creating jobs. But I think we (even the uneducated unwashed) can agree that the government would do a better job than Paris Hilton.
Just saw the following headline in Slate:
The Internet Blowhard’s Favorite Phrase
Why do people love to say that correlation does not imply causation?
So does that make me an "internet blowhard," or is the fact that I just used the phrase above another example of the correlation/causation fallacy?
The article is by Daniel Engber, and I've gone through it a couple of times and am not sure I get his point. The short story is that he seems to acknowledge the prevalence and problems with the fallacy, but he calls people who point it out "blowhards" and seems to advocate that we stop using the "catch-phrase" altogether. Go figure.
He starts off on a confusing note by citing a study which apparently found that depressed college students send more email and IMs and do more file-sharing than do non-depressed college students. In other words, the study found a correlation. He doesn't say that the study somehow concluded that these kinds of internet activity caused depression, or that depression caused these kinds of internet activity. If he had, that would have explained his next sentence, which was that "Not everyone found the news believable."
But so far, the news is just that there was a correlation. From his report, I don't see any indication that anyone disbelieved the correlation. One has to click on his links to see what the article reporting the study actually said, and what the commenters were reacting to. In short, it was this:
"The researchers recommend using these primary findings to further identify correlations between Internet usage and other mental health disorders including anorexia, bulimia, ADHD and schizophrenia. The study’s authors hope to use the findings to apply future software applications that could warn Internet users if they are displaying depression symptoms online."
Commenters were reacting to what they thought was an assumption that an increase in certain web behaviors that are correlated with depression actually justifies warning people that they might be depressed. Perhaps the commenters were over-reacting a little bit -- as Engber later points out, correlations often are useful indicators. [as an aside, I generally agree with the commenters that the fallacy is at work here, but I also think the software is silly for other, even more important, reasons]
So he started off on a bad note by not fully explaining his example. He then goes on with stuff like this:
"So how did a stats-class admonition become so misused and so widespread? What made this simple caveat—a warning not to fall too hard for correlation coefficients—into a coup de grace for second-rate debates?"
He never provides data for his assumption that the admonition has become "so misused." So far, he's only given us one example, and in that example, the internet blowhards citing the fallacy were simply pointing out the obvious fact that just because a person sends more email, that doesn't mean he's depressed. Although this doesn't finally answer the question of whether or not the software will be useful, it's perfectly fair to point it out -- I don't think it's a "misuse" of the admonition. But in any event, the reader of Engber's article can't judge whether or not it's misused, because he hasn't taken the trouble to explain how they were using it.
He then goes on to report on the origins of the phrase, and the fact that it has becoming significantly more widespread in the "computer" age.
I am slowly beginning to think that this article is satire -- he seems to be linking correlation after correlation to causation after causation. Let's take a closer look:
"Those first, modest peaks of 'correlation is not causation' show up in print in the 1890s—a date that happens to coincide with the discovery of correlation itself. That's when the British statistician Karl Pearson introduced a powerful idea in math: that a relationship between two variables could be characterized according to its strength and expressed in numbers."
The second sentence more or less refutes the first one here. It's not that "correlation" was "discovered" in the 1890s, it's just that someone figured out a way to use math to show the strength of a correlation. But it's probably true that Pearson's work paved the way for studies that pointed to correlations, and made assumptions about causation, which might have caused others to point to the correlation/causation fallacy. Engber seems to acknowledge this: "As correlations split and multiplied, we needed to remind ourselves of what they meant and what they didn't."
He then provides an interesting if somewhat pointless graph showing that usage of correlation has increased since 1890, whereas "causation" has stayed relatively constant:
Not a surprising graph given that the increase in usage started when the math was worked out. The increase just shows that the concept caught on, and entered the popular vocabulary in a way that "causation" has not. If you think about it, people today use the word "correlation" a lot more than they use the word "causation." (As an aside, he invites "someone else [to] explain why correlations have been trending downward since 1976." I can't answer that definitively here, but I note that there has been an explosion of texts, as well as new words, since 1976. This is apparently just measuring the percentage of the time that any given word in a book (on average) happens to be "correlation." If the overall vocabulary is increasing, the percentage usage of any given word will go down. Likewise, it might be explained by a differential growth of types of literature that are less likely to use the word "correlation" than the types of literature that were sampled for 1976.)
He then points out that "in the present day, . . . Google, Amazon, and the other data juggernauts belch smoggy clouds of information and spit out correlations by the ton" and quotes someone as saying "To them [Amazon, etc.], perhaps, automated number-crunching stands for the highest form of knowledge that civilization has ever produced." And then Engber speculates: "In that sense, the admonitory slogan about correlation and causation isn't so much a comment posted on the Internet as a comment posted about the Internet. It's a tiny fist raised in protest against Big Data."
Hard to tell where he wants to go with this; he seems to have become so enthralled by his metaphors that he forgot to link them to his thesis. I would guess that people who point to the fallacy are typically pointing out a misuse of "correlations," as I am doing above. I'm not trying to shake my tiny fist at big data. It's not a movement, it's just an occasional observation, made in reaction to bad logic (or, if misused, in reaction to perceived bad logic).
Engber then veers off to say that there are other limits on the utility of statistics as well -- e.g. that the term "statistical significance" is arbitrarily set at 5% (i.e. a result is considered "significant" if the chance that it occurred randomly is less than 5%). BTW, this reminds me of a good xkcd.com cartoon on the subject:
That'shttp://xkcd.com/882/ (since people who read this blog might not be as mathematically-inclined as those who read xkcd, the point is if there's a 5% chance of coincidence, and you test something 20 times, chances are that the coincidence will occur in one of your tests).
Back to Engber -- he now wonders why people don't point out this problem with "significance" as often as they point out the problem with correlations:
If you're one of those internet blowhards that likes to point out causation/correlation fallacies, keep up the good work!