My Photo

« Clark | Main | Posting Rules Change »

July 01, 2008

Comments

The "statistical dead heat" has long been a pet peeve. Why the hell can't reporters learn just the tiniest little bit of statistics before they start reporting on poll results?

I hate CNN more than Fox. Unlike Fox, it does a fairly good job of appearing "fair and balanced" and endlessly touts its quality (the BEST political team).

And I can't wait to see Wolf admit that Obama wins, even after he has done backflips for a year showing how Clinton would win, and now, of course, McCain.

Oh, and John King is a tool.

Why the hell can't reporters learn just the tiniest little bit of statistics before they start reporting on poll results?

For the same reason that reporters can't learn that inflation exists before reporting on economics: people that are smart enough and inclined enough to learn such things rarely become journalists and almost never become editors.

In most serious enterprises, when you have generalist writers writing about complex topics in which they're not experts, their writing gets reviewed by experts before release. Of course that doesn't happen in journalism. The thought of running articles about polls by a statistics grad student for even a perfunctory review is literally unthinkable. Just like articles about economics never get reviews by economists, articles about global warming never get reviewed by scientists, etc, etc. It is almost as if journalists have designed their workflow with the goal of deceiving their readers. If these institutions cared about accuracy, they would be structured very differently, but accuracy has no effect on their bottom line, so they dispense with it.

So how will the right-wing turn this into another example of CNN as the "Communist News Network"? Because they will. You know they will.

We can all bitch and moan about the ignorance of reporters and editors when it comes to statistics, or how CNN has a bias against dems or for repubs. But it's all just a waste of time.
The only reason there are headlines like this is because the media loves, wants, and (most of all ) NEEDS the horserace. If everyone thinks Obama is extremely likely to win the election despite the daily give-and-take, why would anyone tune in?
It's the ratings, stupid.

Apparently I'm missing something (it's been a while since my last statistics course). I've oulined my understanding below; please correct me if/where I'm wrong.

The margin of error is the radius of a confidence interval (often 95%); it's 95% likely that the true value is less than the margin of error away from the sample value.

In this case, a 50-45 split with a 3.5% margin means that Obama's actual support is (95% likely to be) between 46.5% and 53.5%, and McCain's actual support is between 41.5 and 48.5%. So, while it's unlikely, McCain could be leading Obama by as much as 2 points, so it's a "statistical dead heat".

Robert, that's what I was going to comment on, the one thing my Stats professor was going to hammer on is that if the range of values for the confidence interval overlapped you couldn't say the values were statistically different.

In this case, a 50-45 split with a 3.5% margin means that Obama's actual support is (95% likely to be) between 46.5% and 53.5%, and McCain's actual support is between 41.5 and 48.5%. So, while it's unlikely, McCain could be leading Obama by as much as 2 points, so it's a "statistical dead heat".

I don't think that's quite right, Robert. I think what it really means is that the computed percentage of those favoring McCain, for example, lies in a band of plus or minus 3.5% of the true value, not vice versa.

But it's early, and i haven't found the bottom of my first cup of coffee yet, so I could be mistaken.

Let's see now. The 95% margin of error is about two standard deviations. For McCain to be tied with or ahead of Obama, Obama's poll number would have to be too high by about 1.4 SDs. A quick check (using a one-sided Z test) indicates that the probability of that is about 8 or 9 percent.

I wouldn't call that a dead heat.

Jim, I think your estimate is too high. If the 95% margin of error is twice the SD, then the SD would be about 1.75 points, not 3.5, so 5 points would be 2.8 SDs, not 1.4.

KC: yes, but... The most favorable assumption for McCain is that his number rises by the same amount that Obama's falls; given that, Obama only has to lose 2.5 points for the tie. Call it a worst- (or best-) case scenario.

I think what it really means is that the computed percentage of those favoring McCain, for example, lies in a band of plus or minus 3.5% of the true value, not vice versa.

I think that's right, though as a practical matter there's not much difference between the two. The point is that the actual breakdown is not a random variable, and thus has no probability associated with it. It's the poll results that are random variables.

if the range of values for the confidence interval overlapped you couldn't say the values were statistically different.

But the confidence interval is an arbitrary choice. There's no particular reason, in a political poll, to think of a 95% interval as some sort of law of nature. Why not 90%? And there's a difference between, "We cannot say with 95% confidence that Obama is ahead," and "It's a tie."

The fact is that a candidate who in fact has a lead of any size is very unlikely to come out behind in a survey of 700 voters.

The point is that the actual breakdown is not a random variable, and thus has no probability associated with it.

I'm not completely sure what you mean by "the actual breakdown", here, but you appear to be saying that the mean is not of any importance, and in fact cannot be assigned a probability. Which seems self-contradictory to me, so I probably misunderstood.

I want to further caveat the above by stating that my expertise, if you can call it that, is more in stochastic processes and modeling than sample statistics, so I may have missed something.

"the mean" should have communicated as "the true mean", not to be confused with the sample mean. Fail.

WELL, I THIMK THAT YOU SHULD BAN ALL CAPS POSTTINGS, AND REQIRE PEEPLE TO USE A SPELL CHEKKER!

Sorry, posted in wrong thread, please ignore.

I wonder if we can get him banned for doing that?

Speaking as a non-statistician, the problem I have with CNN's spin on this is that, if the election were being held tomorrow, CNN would be mocked openly for declaring a 5 point lead by anyone as a statistical dead heat. That's a significant lead, especially for a Presidential contest. And what's more, if the polls turned out to be wrong, the storyline the next day wouldn't be "they were in a statistical dead heat." It would be "how did we get this so wrong?" CNN's been playing this game long enough to know better.

Slarti,

You argue even when I agree with you.

Sorry to be unclear. What I meant was that the actual population mean is a fixed value, not a random variable, so it doesn't have a probability distribution. That doesn't mean it's not important. On the contrary, it's exactly what we're trying to estimate from the sample.

The sample mean, on the other hand, is a random variable, and hence has a probability distribution associated with it. That's because the sample mean depends on the sample we happen to draw.

So it is technically incorrect to say, "There is a 95% chance that the true value lies in the confidence interval." Rather, one should say, "There is a 95% chance that the confidence interval covers the true value."

How important this is in casual discussions is another matter.

Bernard, you should take my previous response to you as more of an ongoing-clarification-attempt than argumentation.

Fortunately, you've made yourself sufficiently clear, and I have nothing more to say (other than: thanks for your responses, as well as your patience).

Stupidity has a well known right-wing bias.

The comments to this entry are closed.