Fun with Statistics and the Gun Control Debate
Gun
control advocates often contend that more gun deaths occur when guns
are abundant in society. The statistics they use to make the case are
not persuasive.
The most commonly cited evidence for the linkage is that the states with higher gun death rates also tend to be the states in which gun owners are a higher percentage of the population.
The scatter plot in Figure 1 compares the gun death rate to the gun ownership rate. One can readily see that states with lots of gun-owners also tend to have high gun death rates. It is not a perfect relationship, but any objective observer will accept that the scatter plot suggests a linkage.
The problem is that gun control advocates often treat this scatter-plot as prima facie evidence when in fact it suggests only a possible
relationship. Because this statistical "science" appears to support
their theory, gun control advocates rarely care to look into the matter
any deeper. This is a subtle form of confirmation bias.
The scatter plot in Figure 1 is based on the most commonly used data (CDC and Kalesan et al). It is an accurate representation of what is referred to in statistics as linear regression. On the vertical axis is plotted the phenomenon we are trying to explain — the gun death rate. By convention, the horizontal axis is used to plot the theorized cause of variations in the gun death rate — in this instance, the gun ownership rate. Each point represents an individual state. The straight line running through the points (the regression line) seeks the middle and summarizes the average amount of change in the gun death rate associated with a given amount of change in the gun ownership rate.
There are all sorts of problems with the accuracy and the relevance of the data used to construct this scatter-plot. A serious evaluation of the possible relationship between gun deaths and gun availability would call for better data, but since the objective in this essay is only to reveal some pitfalls associated with conclusions based on regression and correlation there is good reason to use the same data sources as those who draw faulty conclusions from them.
Statisticians rely on a computed number known as the correlation coefficient to measure the degree to which the plotted points line up on the regression line. If all plotted points fall on the line, the correlation coefficient will be one. If the points are randomly scattered, then it will approach zero. Virtually any scatter plot that involves human behavior rather than physical processes will fail to approach either extreme.
The scatter-plot presented in Figure 1 has a computed correlation coefficient of 0.697. This is over two thirds of the way from zero to a perfect one and suggests a strong relationship between gun deaths and gun ownership.
But sometimes a lot can be learned by comparing the correlation coefficients for two different relationships. Let's look at data for vehicular death rates and vehicle ownership rates. For both gun deaths and highway deaths, we might expect that a higher prevalence of the device involved (guns or cars) will contribute to the likelihood of the event occurring. Once again, the data are taken from the most commonly used sources (IIHS and Wikipedia).
Figure 2 shows the scatter-plot and the correlation coefficient for both the gun death question and the highway death question. The correlation coefficient for explaining gun deaths is much higher than that for explaining vehicle related deaths: 0.697 vs. 0.355. The relationship between gun deaths and gun ownership is much stronger than the relationship between car deaths and car ownership. The "guns cause gun deaths" theory continues to hold up, whereas the "cars cause car deaths" theory looks weak. The two theories rely on comparable logic, but the statistics seem to give more credence to the "guns cause gun deaths" theory.
But
now let's attempt what for a social scientist may be the closest thing
to a double-blind control. Let's test the anti-theories. Let's use the
same methodology to see whether car ownership causes gun deaths and
whether gun ownership causes car deaths. Both propositions defy logic,
so we would expect their correlation coefficients to approach the zero
value. If they do not, then correlation may be less useful than
expected, since it would fail to distinguish between reasonable and
unreasonable indicators of causality.
Figure 3 shows the results.
For
the proposition that car ownership causes gun deaths, the correlation
coefficient is 0.349. For the idea that gun ownership causes car
deaths, it is 0.625. All of a sudden, things are getting
confusing. Can we really believe that gun ownership is almost as good
an explanation for car deaths as it is for gun deaths (0.625 vs.
0.697)? Can we honestly contend that car ownership is as good (but not
very good) at explaining gun deaths as it is at explaining car deaths
(0.349 vs. 0.355)?
Finally, here is the ultimate absurdity. Figure 4 provides the scatter plot and correlation coefficient for the silly idea that car deaths cause gun deaths. The correlation coefficient of 0.788 (!) is higher than all the others we have seen. Shall we conclude that car deaths do more to explain gun deaths than gun ownership does? To do otherwise would be to value intuition and perception over statistical measurements — not exactly a scientific approach to the matter.
Maybe
some as yet undiscovered factor accounts for variation in both gun
deaths and vehicle deaths. Perhaps more likely is that the data are
seriously flawed. In any event, the simple statistical approach so
commonly used to ascertain the "cause" of gun deaths is inadequate.
Before we use statistics to address a problem of this sort, we must become knowledgeable about guns and data and how to interpret correlation and regression. Before we buy into "causality," we need to know the dangers of accepting the superficially self-evident.
The most commonly cited evidence for the linkage is that the states with higher gun death rates also tend to be the states in which gun owners are a higher percentage of the population.
The scatter plot in Figure 1 compares the gun death rate to the gun ownership rate. One can readily see that states with lots of gun-owners also tend to have high gun death rates. It is not a perfect relationship, but any objective observer will accept that the scatter plot suggests a linkage.
The scatter plot in Figure 1 is based on the most commonly used data (CDC and Kalesan et al). It is an accurate representation of what is referred to in statistics as linear regression. On the vertical axis is plotted the phenomenon we are trying to explain — the gun death rate. By convention, the horizontal axis is used to plot the theorized cause of variations in the gun death rate — in this instance, the gun ownership rate. Each point represents an individual state. The straight line running through the points (the regression line) seeks the middle and summarizes the average amount of change in the gun death rate associated with a given amount of change in the gun ownership rate.
There are all sorts of problems with the accuracy and the relevance of the data used to construct this scatter-plot. A serious evaluation of the possible relationship between gun deaths and gun availability would call for better data, but since the objective in this essay is only to reveal some pitfalls associated with conclusions based on regression and correlation there is good reason to use the same data sources as those who draw faulty conclusions from them.
Statisticians rely on a computed number known as the correlation coefficient to measure the degree to which the plotted points line up on the regression line. If all plotted points fall on the line, the correlation coefficient will be one. If the points are randomly scattered, then it will approach zero. Virtually any scatter plot that involves human behavior rather than physical processes will fail to approach either extreme.
The scatter-plot presented in Figure 1 has a computed correlation coefficient of 0.697. This is over two thirds of the way from zero to a perfect one and suggests a strong relationship between gun deaths and gun ownership.
But sometimes a lot can be learned by comparing the correlation coefficients for two different relationships. Let's look at data for vehicular death rates and vehicle ownership rates. For both gun deaths and highway deaths, we might expect that a higher prevalence of the device involved (guns or cars) will contribute to the likelihood of the event occurring. Once again, the data are taken from the most commonly used sources (IIHS and Wikipedia).
Figure 2 shows the scatter-plot and the correlation coefficient for both the gun death question and the highway death question. The correlation coefficient for explaining gun deaths is much higher than that for explaining vehicle related deaths: 0.697 vs. 0.355. The relationship between gun deaths and gun ownership is much stronger than the relationship between car deaths and car ownership. The "guns cause gun deaths" theory continues to hold up, whereas the "cars cause car deaths" theory looks weak. The two theories rely on comparable logic, but the statistics seem to give more credence to the "guns cause gun deaths" theory.
Figure 3 shows the results.
Finally, here is the ultimate absurdity. Figure 4 provides the scatter plot and correlation coefficient for the silly idea that car deaths cause gun deaths. The correlation coefficient of 0.788 (!) is higher than all the others we have seen. Shall we conclude that car deaths do more to explain gun deaths than gun ownership does? To do otherwise would be to value intuition and perception over statistical measurements — not exactly a scientific approach to the matter.
Before we use statistics to address a problem of this sort, we must become knowledgeable about guns and data and how to interpret correlation and regression. Before we buy into "causality," we need to know the dangers of accepting the superficially self-evident.
No comments:
Post a Comment