Assessing the role that chance may have played in fair tests

The role of chance can lead us to make two types of mistakes when interpreting the results of fair treatment comparisons: we may either mistakenly conclude that there are real differences in treatment outcomes when there are not, or that there are no differences when there are. The larger the number of treatment outcomes of interest observed, the lower the likelihood that we will be misled in these ways.

Because treatment comparisons cannot include everyone who has had or will have the condition being treated, it will never be possible definitively to find the ‘true differences’ between treatments. Instead, studies have to produce best guesses of what the true differences are likely to be.

The reliability of estimated differences will often be indicated by ‘Confidence Intervals’ (CI). These give the range within which the true differences are likely to lie. Most people will already be familiar with the concept of confidence intervals, even if not by that name.

The 95% Confidence Interval (CI) for the difference between Party A and Party B narrows as the number of people polled increases.

The 95% Confidence Interval (CI) for the difference between Party A and Party B narrows as the number of people polled increases (click to enlarge).

For example, in the run-up to an election, an opinion poll may report that Party A is 10 percentage points ahead of Party B; but the report will then often note that the difference between the parties could be as little as 5 points or as large as 15 points. This ‘confidence interval’ indicates that the true difference between the parties is likely to lie somewhere between 5 and 15 percentage points.

The larger the number of people polled, the less uncertainty there will be about the results, and therefore the narrower will be the confidence interval associated with the estimate of the difference.

Just as one can assess the degree of uncertainty around an estimated difference in the proportions of voters supporting two political parties, so also one can assess the degree of uncertainty around an estimated difference in the proportions of patients improving or deteriorating after two treatments.

And here again, the greater the number of the treatment outcomes observed – say, recovery after a heart attack – in a comparison of two treatments, the narrower will be the confidence intervals surrounding estimates of treatment differences. With confidence intervals, ‘the narrower the better’.

A confidence interval is usually accompanied by an indication of how confident we can be that the true value lies within the range of estimates presented. A ‘95% confidence interval’, for example, means that we can be 95% confident that the true value of whatever it is that is being estimated lies within the confidence interval’s range. This means that there is a 5 in 100 (5%) chance that, actually, the ‘true’ value lies outside the range.

  • Steve George

    Overall this is a superb book and website. However, the stated meaning of ‘confidence interval’ is not correct. Maybe this is an intentional simplification because the book and website are intended for a broad audience. However, it makes one suspicious about other claims made by the authors if one of the important aspects is wrong. The correct meaning of a 95% confidence interval is that 95 out of 100 confidence intervals obtained in the same way (same population and same sample size) will include the true mean. To say that there’s a 95% chance that the true mean lies within the confidence interval would mean that there many different true means, and 95 out of 100 of them fall within this particular confidence interval. Of course there is only one true mean, and it will lie within 95 out of 100 similarly-obtained confidence intervals.

    • Anonymous

      Many thanks for your kind words Steve, and I am sure that the team will want to make sure that everything is as accurate as it can be.

      You are right, in that the intention is to explain confidence intervals for an informed lay reader. I know from experience that this is not easy, and that sometimes an approximation is easier to understand.

      Stay tuned and I will see what they say.

    • Paul Glasziou

      Thanks for your complimentary remarks about the book. We might have used a different approach in our effort to explain confidence intervals, and we discussed this when writing the section. The deliberate simplification we used reflected our experience of trying to explain the precise frequentist interpretation of confidence intervals to lay audiences: this approach either seems to confuse them or goes over their heads. We could also have used Credible Interval and a uniform prior to match our more Bayesian explanation (, but that is not the term people are likely to come across.
      We are currently searching systematically for formal comparisons of the extent to which among alternative wording to explain research methods most helps lay people to get the right end of the stick. This is one of several issues that we would like to see addressed empirically to improve the evidence base needed to support better understanding of health research. Please let us know if you would like to be involved, and we would also encourage you and readers to become involved in – An international Network to Support Understanding of Health Research.

  • Robert42

    Confidence intervals represent the uncertainty of an estimate attributable to sampling error. Small sample, bigger error, broader confidence interval. Big sample, smaller error, narrower confidence interval. If the sample encompasses all of the sample frame the uncertainty falls to zero and the confidence intervals disappear.

    A 95% confidence interval means that if we were to repeat our test 100 times, the calculated confidence intervals would encompass the mean arrived at through full and complete coverage of the sample frame roughly 95 times. This mean is not a ‘true’ value. Confidence intervals only represent the uncertainty from sampling. There will be other errors in the measurement system that will forever keep us from knowing the ‘truth’. This is why Deming insists that there is no truth in measurement and the idea there is, is so destructive to understanding statistical analysis.

    To conclude say, a difference between a treatment and control group is statistically significant at the 95% level, only means our experiment would come to similar estimates 95% of the time. It doesn’t mean the difference is real or true. (After all, we already knew the two groups were different.) All manners of statistical significance are comments on the measurement system used, not the reality being measured.