Statisticians Warn Of “Systemic Mainstream Misinformation” In Poll Data
Antagonism isn’t perpetual
If you recently glanced at the polls and the election markets, then you would be forgiven to believe that a landslide election is looming. It’s likely not, and the spreads have the potential to revert in surprising ways between now and Election Day. The drumbeat of negative news against Donald Trump may not cause further damage. We’ve discussed numerously, starting on October 11 and October 12, that Hillary Clinton’s runaway spread would revert (here, here, here, here).
Of course that’s a stand taken against a popular headwind, but also an opportunity to make money on an election bet that is mispriced. For example, when we wrote the reversion article, the betfair ask that Mr. Trump’s popular vote could remain in the 40’s% was only priced at 1:6 odds. Nate Silver’s 538 site also reflected this, as shown below. But we -and other academic statisticians- knew that this was faux election probability, and advised thousands to remain vigilant against planned mainstream misinformation.
Incidentally, today’s betfair bid is 20% higher; not many investments have risen 20% in just the past couple days. And the wager could explode to 500% profit, exposing how steeply deluded the polls have been. This article isn’t merely about gambling, but goes to the heart of what makes polls different among one another, and across time. And what should we be cautious of when interpreting the information, while almost never reading (and sometimes not having access to) all the underlying probability details of the poll generation? In particular, we’ll delve into the inconspicuous L.A. Times poll here, where for much of the past month they showed Donald leading Hillary. How did they come to that, and what value is there in paying attention to alleged outliers?
Recently the New York Times (NYT) wrote a piece that the USC/L.A. Times (LAT) poll was biased against Hillary Clinton by at least 4 percentage points, through the exaggerated sampling of one Black Chicago youth. The NYT thesis for sampling issues was not based on general theory at all, but only because the survey respondent was a feverish Donald Trump supporter. Apparently the LAT has always been a good pollster until this one Black man became a Trump supporter; now the LAT poll is suddenly comprehensively terrible. Right… Now the NYT was both smart and correct in pointing out the seeming anomaly, but also misdiagnosed the root cause of the puzzle.
The LAT should retain their entire sample, and not simply alter responses because the pollster doesn’t like what he or she hears. Removing select responses has that same effect, and this is partly why mainstream pollsters have systematically unfavored Republicans in nearly 2/3 of elections in the past several decades, where there have been a meaningful surprise in the general election outcomes. And in every case where such a reversal of fate has led to an actual victory for the October polling-laggard, it was always a Republican who won. This should give everyone pause to consider the strength of these “scientific” polls. We can often see something be misrepresented, yet be masquerade as disciplined science.
Now the LAT pollster allows for some interesting statistical features that are not in other polls (many of which follow our blog). For example, it allows the survey participant to partially self-weigh their own response, and factors in his or her own prior voting record. These are worthy developments in most cases, including the case here of the 19-year old Black Trump supporter. Polling has to fill in a lot of gaps, particularly this year where there are a greater than normal number of undecideds and non-responders. This increases the error, not lessens it (per our viral article here read by >100 thousand including senior advisers of both parties). And the fact that most other polls do not scale their survey responders accordingly, equally leads to a higher than expected favorability (based only on momentum) for those who for now agree with Ms. Clinton more so than Mr. Trump.
Of course we know across all polls this year there is a perception that Hillary has an polling edge when it comes to “perceived” favorability or social desirability (it’s been noted that 10-15% of people have lost a friend due to the 2016 election); though this conflates with the overall bias going back many decades and so it’s unclear how much additional bias comes from that. But the NYT overestimates the overall edge that the LAT has if this one Black youth is completely off in his responses; it is only about 1-2 percentage points. Not enough to close the nearly 5-10 percentage point difference the LAT has with the rest of the mainstream polls. The NYT is correct that the overweighting by LAT may exist however, in that this one individual is weighted a little more relative to the typical person. But this does not negate the data point altogether. Does anyone credibly believe thatnot a single Black person is going to vote for Donald Trump?
The bottom line is that polls on the fringes (e.g., the LAT and to a lesser degree only the trends in the conservative-advocating Rasmussen both showing Mr. Trump leading for much of the past month) should be taken a little more seriously due to the informative value they provide in how the many undecideds and non-responders will ultimately vote. In historical polling data people tend to make up their mind for candidates, and rarely does it lead to further subtractions from current polling levels. It is doubtful therefore that somehow any new negative information about Donald would compel someone, at long last in these final weeks, to ultimately switch allegiances. And while the theory of poll of polls works great to reduce the variance of errors, it does nothing to counter any systematic errors we may see hurtling through in the current election cycle. This is a significant lesson that remains lost among political hacks keen to simply analyze the data.
Another note is that you should be wary of taking too seriously the political advice of people who so recently badly errored in the Primary elections! This is not to cast a spotlight on any one individual, since the entire field of data journalism just saw a catastrophic result over the past year. But it’s clear from the polling and the prediction betting market levels that the grave lessons from the past have not yet been learned. This summer’s Brexit vote was just another example of election-eve overconfidence by pollsters and bookies. But stateside we do see the promotion of false confidence on preposterous polling statistics. The media ratings pursuit must inherent some blame, since news demands easily digestible insight that crookedly beguiles their patrons. And if we expose the overshadowing uncertainty surrounding these election predictions, then no one would venture into paying further attention. Even more reason for you to pay some attention to the outlier polls, especially this year!