Method, Method, Its All In The Method

May 15, 2009

click me^–>

[First published September 29, 2005] I don’t see the need to respond to every criticism of my research unless doing so cast more light on the democratic peace, or the incredible democide of the last century, being carried over into our new one by the ruling thugs in Burma, Sudan, and North Korea. For this reason, I will pay some attention to Dr. John Grohol’s item on my research and attendant criticism (here).

I will quote his points and respond to each below:

Rummel’s conclusions have been criticized the lack of definite correlation. He neglects current conflicts between Israel and Palestine as well as India and Pakistan, all of which are democratic nations–although Rummel’s defenders would retort that Palestine was never a real democracy until 2005, and that Pakistan is ruled by a strongman who wields a great deal of undemocratic power.

Moreover, were Israel truly at war with Palestine, Palestine would be destroyed due to the enormous disparity of power, and if Pakistan and India were truly at war with each other then tens of millions would die. Rummel’s real point is that democracies rarely go to war with each other, and liberal democracies (defined by free speech, free press, and universal franchise) never do. Neither Pakistan nor Palestine, at this time, qualifies as a liberal democracy.

RJR: He raises the criticism and then rebuts it himself

Rummel’s conclusions have also been criticized for not considering the number of deaths due to anarchy and the lack of government, through mechanisms such as civil conflict, the breakdown of society, and foreign invasion.

RJR: I do, and my estimates for each country include that for war dead and internal nondemocidal violence. Moreover, the most anarchical system is international relations, wars of which I have tallied and included in my analysis.

Some have found the data that he uses to be questionable.

RJR: This is unhelpful. Details please.

Other people point out that his methods of calculation of the death toll are highly controversial. He compares the statistical data before and after a certain date and derives an estimate about the number of killings that occurred between.

RJR: This is called interpolation, and what interpolation is wrong is not detailed.

However, he fails to establish evidence of actual killing.

RJR. No indication of what estimates of mine were wrong. I use all kinds of documents to establish democide, such as refugee reports, memoirs, biographies, historical analyses, actual exhumed body counts, records kept by the murderers themselves, and so on.

Moreover, his results are based on an absolute trust in statistical data and statistics are prone to errors. However, he himself uses the wider sense of “killed by”, including all kinds of “reason-result” relationships between acts of government and actual deaths. Moreover, in calculating the number of victims, he doesn’t feel he needs evidence of a death; the result of statistical calculation is, for Rummel, effective proof that death occurred.

RJR: Wrong. This deserves a full response: I don’t believe any of my estimates of democide tell the true death toll. Nor do I believe anyone will ever know the precise number of people murdered in any democide, including the Holocaust (estimates in this best of all studied genocides and with the best archival and other records still differ by over 40%). Then what is the purpose of estimating democide? Two reasons dominate: moral assessment, and related scientifically based policy. Democide is a crime against humanity, one of the worst crimes the rulers or leaders of a government can commit. But there are levels of democide, and I see a moral difference between rulers that murder at different orders of magnitude (powers of ten). That is, I find the evil of a Stalin who most probably murdered over 20,000,000 people (and this seems to encompass 99.9 percent of all estimates) greater than rulers who murdered 1,000, 10,000, 100,000, or even 1,000,000. More specifically, my moral gauge clicks in at orders of magnitude. (There are other moral gauges, of course, such as the proportion of a population murdered; how people were murdered, such as randomly or by ethnicity or race; whether the intent was genocide or revenge, etc.) The moral question for me is then whether an estimate captures the order of magnitude. While I don’t think we can ever get a true estimate, I do think we can bracket the range of estimates within which the true value must be found, either absolutely or probabilistically.

As to the second criteria for accepting an estimate, my concern is to forecast the most likely order of magnitude of democide based on the characteristics of a society, nation, culture, ruler, leadership, people, geography, and so on. This is a scientific problem and engages methodological and technical questions inappropriate here. What is appropriate to the question of errors in democide estimates is at what level of error we get meaningful enough results to define the causation involved in democide, when no actual estimate is true. And since the estimates are usually close enough in magnitudes to enable us to rank nations, and divide them into groups of more or less, then we have enough precision to carry out scientific tests as to what causes democide.

For an example of alleged manipulation: Rummel estimates the death toll in the HYPERLINK “”Rheinwiesenlager ( see here) as between 4,500 and 56,000. Official US figures were just over 3,000 and a German commission found 4,532. The high figure of 56,000 also merited the notation “probably much lower” in Rummel’s extracts.

RJR: Misleading. This is about the German POWs that died in American camps after the war due to mistreatment and lack of care. The different estimates I used are record here (lines 228-237). As you can see, the estimates generally are close to the ones given above, and I end up with a range of 3,000 to 56,000, with a most probable estimate of 6,000. Grohol does not understand that the low and high are meant to be the most unlikely low and high, and thus to bracket the probable true count (I did point this out). It is to determine these lows and highs that I include what some others might consider absurd estimates. And in this case, my low and high does bracket the figures he gives.

Another flaw in Rummel’s statistical calculations is that he doesn’t use error margins.

RJR: Of what meaning are error margins when dealing with the universe of data, and not a sample? For example, if one takes a poll of 1,000 people about their opinion on the Iraq war, the result may be 48 percent favorable within a margin (standard deviation) of 2.4 percentage points. But, if the poll is taken of all American adults, this is the universe and there is no error margin or standard error. I am dealing with all estimates available in English for ALL NATIONS over a period of a century, and available in the libraries I worked in, including the Library of Congress. In no way can these estimates be considered a sample, not even a sample of all estimates (say those in the Russian, Chinese, and Korean archives), since then the estimates I used are not random, or selected in some statistical sense.

Link of Day

“5 yrs of intifada: 1,061 Israelis killed”


“Palestinians’ celebrate five years of terror war”

Yes, celebrating the murder en mass of unarmed civilian women and children, mothers and fathers, and sometimes whole families, walking the street, eating in restaurants, dancing in a club, or marketing. Some who survived paralyzed, with lose of their limbs, blinded, or suffering life long internal injuries might envy the dead. And genocide scholars, mainly American and European Jews who tend to side with the Palestinians, refuse to recognize the genocide it was. A case of genocide denial by the very people who are outraged at those who deny the Holocaust. But, there is no denial by the Palestinians, there is celebration.

Links I Must Share

“An Islamic guide on how to beat your wife”
And leave no marks.

” Top U.S. Military Intel Officer: Zarqawi ‘Hijacked’ Insurgency”

“The Mother of All Connections” By Stephen F. Hayes & Thomas Joscelyn. In. The Weekly Standard :

From the July 18, 2005 issue: A special report on the new evidence of collaboration between Saddam Hussein’s Iraq and al Qaeda.

Excellent article. Read and inform yourself.

“Somaliland in first vote for MPs”
Another new democracy. Cheers.

Democide data estimation method

On That Mysterious “p” In Quantitative Reports

January 9, 2009

[First published March 15, 2006] Everyone who does a lot of reading of reports, studies, and the like is bound to run across “p < .05” or some other fraction, such as “p <.0005”. Or, instead, they will read that, “the results are significant,” or “not significant.” What is going on here? I will try to explain this, nontechnically, without all the details beloved of the statistician (no Type I and Type II error, no two-sided test versus one-sided, no normal distribution, no equations, etc.), and even oversimplify for the purpose of clarity.

To begin, p stands for “probability.” Thus, p < .05 means that the probability is less than .05. For example, if there are 100 balls in a basket and 4 of them are red, the probability of blindly selecting a red ball is p < .05, or less than a chance of getting it once in over 20 tries, or 5 times in over 100 tries . But, this understanding by itself can be misleading if a sample of some sort was analyzed.

For example, assume that a randomly selected sample of some sort has been analyzed, as of 100 American college students in order to determine for the population (universe) of all American college students the correlation between getting drunk at least once a month and grades. Let us say the correlation between such drunkenness and grades is .17, p < .05. How to interpret this? Not as a straight forward probability of getting .17.

Rather, the idea is that one has implicitly tested what is called the null hypothesis that the true correlation for all American students is r = 0, which means that hypothetically there is no correlation between getting drunk at least once a month and grades. The p <.05 then means that if one rejects the null hypothesis and accepts that .17 is true for the population of students, the probability of this choice being in error is 1 out of more than 20. Although in research on samples, the null hypothesis is usually not stated, it is there nonetheless (some classes in statistics require students to always state the null hypothesis). Regardless of whether the statistic being applied is a t-test, F-ratio, chi-square, or some other, the implicit assumption usually is that for the population the sample represents, the true statistic is zero. Then, the p indicates the chance of error if this hypothetical value for the population is rejected in favor of accepting the one actually found for the sample.

As another example, in a regression analysis on a sample, the resulting regression coefficients may be given with associated t-tests and p-values. Assume, for example, a regression coefficient is 3.4 with a t-test of 2.0 and p < .03. The assumed null hypothesis is that the regression coefficient for the universe the sample represents really is 0, and if this is rejected in favor of the finding that it is 3.4 for the population, the chance of error in doing this is less than .03. That is, if this study was replicated over 100 times, it is likely that in 3 of them the regression coefficient would be 0.

I have made the null hypotheses equal to 0, which is generally the case. But, it can equal any number. Regardless, the question is still answered by p as to how probably the research will be in error if it rejects the number given for the population in the null hypothesis in favor of the number found by the research.

When a null hypothesis is rejected with little chance of error, the result is called significant. The acceptable probability of error — significance — among scientists is a matter of tradition, which is that if the chance of error is p equal or less than .05, the result is significant. This is a convention, however, and a researcher may be conservative about error and define significance in his research as p < .01. Or, if the researcher believes there is much random error in his data, he may than raise the significance level to something like p < .1. In other words, when a study says its correlation is significant, it is saying in effect that its correlation is such that there is little chance of error in rejecting the possibility that it is zero (or some other number). If a study says it has conducted a significance test, it is saying that it calculated the p-value; and if it says the result was nonsignificant, it means that the chance of error in rejecting the null hypothesis was too great. But, without knowing the p values, there is no way of knowing what chance of error the researcher found acceptable or unacceptable.


The danger in significance tests is that the p-value is completely dependent on the sample size (N). See the change in significance (p-value) for the very low correlation of .15 at different sample sizes:

N = 10, p = .34
N = 50, p = .15
N = 100, p = .07
N = 500, p = .0004

All one needs to do, it seems, is to increase the sample size to get very significant results, although totally meaningless ones. What does this mean? To understand what the correlation coefficient means for the relationship between two variables, for example, square it and multiple by 100. The result will be the percent of variance (variation) in common, or shared between the two variables. So, if one does this for r = .50 (to make this easy), the result is 25%. To say that two variables have 25% of their variation in common is a lot more meaningful then saying that their correlation is .5. Thus, an r = .80 means 64% of the variation is in common; r = .90 means 81% in common, and so on. This is a way of getting at the true empirical meaning of a correlation, and one that is not dependent on sample size and the significance test.

And this displays a major problem with significance tests. Consider the correlation of .01 for a sample of 500 people, and a significant p = .041. By convention, this itty-bitty correlation is significant, and the unwary researcher might so report it. But, it is meaningless. For the variation in common between the two variables is an incredibly low .01%, or virtually a zero relationship. And yet, it is significant! Always consider the variance in common along with the significance test.

Another problem is that the null hypothesis and its significance test assume that the analysis is carried out on a sample selected in some appropriate way to reflect a population. But, the analysis may be of all nations, all American senators, all students at Yale, and so on. There is no sample. One might say, however, as some researchers have tried to do, that this is a sample of all nations, senators, or Yale students that have existed, or will exist. But, then the problem is that the sample is in no way a randomly selected representation of this population, which violates an assumption of the significance test.

So, the usual significance tests are inappropriate when analyzing a whole population. If, for example, r = .60 for the relationship between development and literacy for all 192 nations in 2005, then there can be no null hypothesis, since this correlation is truly .60 for all nations. Yet, as some readers may have noticed, I have p-values scattered throughout my research even though I am analyzing all nations.

There is another way to look at probability, then for samples. One can, for example, calculate the probability of tossing a coin and getting five heads in a row; of not getting a seven in ten tosses of the dice; and of none of the 122 democracies among 192 nations having had any of the conflicts in that year among themselves. Such probabilities can also be given as p-values. The p of getting heads in a row is the probability of getting one head in one toss ( = .5) to the third power, which is p = .125, or p < .50. This would then be significant.

So, in my post yesterday, my analysis of variance of the relationship between the terrorism/human rights scale and freedom was an F-statistic of 81.6, p < .0001.This is saying that for all the data on the two variables, the chance that they would up for all nations such that one would get the F-statistic is <.00001 — almost a 0 probability, and thus very significant. There must be something causing such a near impossible pairing, and I say that it is the democratic nature of a regime.

Thus, in the case of analyzing the whole population, instead of its representative sample, the p now is the chance of getting any statistic, such as a correlation, multiple correlation, regression coefficient, chi-square, and so on, just by chance.

Related Links

“Statistical Significance”:

“Significance level” is a misleading term that many researchers do not fully understand. This article may help you understand the concept of statistical significance and the meaning of the numbers produced by The Survey System.

“Statistical Significance”:

What does “statistical significance” really mean?

Elementary Concepts in Statistics

Understanding Correlation