Original Article
The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency

https://doi.org/10.1016/j.jclinepi.2011.07.011Get rights and content

Abstract

Objective

To investigate methods to determine the size of a pilot study to inform a power calculation for a randomized controlled trial (RCT) using an interval/ratio outcome measure.

Study Design

Calculations based on confidence intervals (CIs) for the sample standard deviation (SD).

Results

Based on CIs for the sample SD, methods are demonstrated whereby (1) the observed SD can be adjusted to secure the desired level of statistical power in the main study with a specified level of confidence; (2) the sample for the main study, if calculated using the observed SD, can be adjusted, again to obtain the desired level of statistical power in the main study; (3) the power of the main study can be calculated for the situation in which the SD in the pilot study proves to be an underestimate of the true SD; and (4) an “efficient” pilot size can be determined to minimize the combined size of the pilot and main RCT.

Conclusion

Trialists should calculate the appropriate size of a pilot study, just as they should the size of the main RCT, taking into account the twin needs to demonstrate efficiency in terms of recruitment and to produce precise estimates of treatment effect.

Introduction

What is new?

  • Small pilot studies may provide imprecise estimates of the standard deviation (SD), and resulting power calculations for the main study may be correspondingly imprecise, but guidance on the appropriate size of a pilot study is sparse.

  • Basing the power calculation on a value at the upper confidence limit for the SD can provide reassurance that the main study will have the desired level of statistical power.

  • An inflation factor can be used to calculate this adjusted value for the SD, in relation to a pilot study of a given size, and can also be used to determine an adjusted sample size.

  • The size of a pilot study for a randomized controlled trial using an interval or ratio outcome measure should be determined through a calculation based on the precision of the estimate of the SD.

A sample size calculation for a randomized controlled trial (RCT) is undertaken to estimate the minimum number of participants required to detect as significant a prespecified effect, with a stated level of statistical power and at a chosen significance level [1]. Here, the significance level equates to the risk of incorrectly rejecting the null hypothesis (a type 1 error), and power is the probability of detecting as statistically significant an effect of a specified magnitude, if it exists; this is equivalent to the probability of avoiding a type 2 error.

Where the outcome variable of interest is an interval/ratio scale and the effect in question is a mean difference, the sample size calculation depends in part on the value of the standard deviation (SD) of the outcome variable in the main RCT. This is unknown and often estimated by the SD from a pilot study; this process is equivalent to the estimation of a population parameter. However, given that the pilot SD is a random variable, it may be an under- or overestimate of the SD in the main RCT. Accordingly, an RCT may turn out to be under- or overpowered to detect the specified effect, owing to the SD of data in the trial—on which the hypothesis test is based—being respectively larger or smaller than the value used in the prior sample size calculation. Under- or overestimation of the SD in the main RCT may be for two reasons. First, the estimate used in the calculation may not be appropriate for the clinical population in which the trial is conducted (e.g., it was derived from a previous study of patients whose age, chronicity, or symptom severity differed from that of the patients in the RCT). That is, the SD used in the sample size calculation may be biased (systematic error). Alternatively, as a random variable, the SD used in the calculation may have under- or overestimated the SD in the main RCT simply through sampling fluctuation (random error).

A pilot study can help to remedy the problem of bias in the estimate of the SD as it can be conducted on the same clinical population as will be included in the subsequent RCT. However, the estimate of the SD from a pilot study may still under- or overestimate the SD in the main RCT through random error. The more pressing concern is the possibility of underestimation of the SD, with the consequence of underestimation of the sample size for the main RCT. This appears to be a common phenomenon [2] and prevents clear conclusions from being drawn from the individual RCTs concerned. It is, therefore, important that a pilot study provides an acceptably precise estimate of the SD so as to reduce the likelihood that the trial is underpowered to detect the prespecified clinical difference. In essence, an acceptably precise estimate of the SD requires the pilot study itself to be of sufficient size. It has been suggested that n = 30 is an acceptable size for a pilot study [3]. With specific reference to estimates of the SD, Julious [4] proposes at least n = 12 per group, equivalent to n = 24 for a traditional two-group study, a figure similar to that proposed by other authors [5], [6]. However, there is otherwise little in the way of specific guidelines on the appropriate size of a pilot study.

The aim of this article is twofold: (1) to guide trialists, at the developmental stage of their research, as to the appropriate size of a pilot study to gain sufficiently precise estimates of the true SD and (2) to help inform trialists, post-pilot, on how to adjust their estimate of the SD, for purposes of the sample size calculation of the main trial, to be confident of not underpowering their main study. Our focus is on the situation in which a prior pilot study is conducted independent of the main RCT, rather than on that in which an internal pilot study is performed; that is, where the required sample size is recalculated on the basis of an estimate of the SD derived from the first patients recruited to the main RCT [5], [6], [7], [8].

Section snippets

Illustrative example

Suppose an RCT is being designed to detect a mean difference in systolic blood pressure of at least 8 mm Hg between two treatment groups. A pilot study is conducted and provides an estimated pooled SD on this scale of 20 mm Hg. Based on this estimate—and assuming 80% power and a 5% two-tailed significance level—100 participants per group would be needed for the analysis of the main study [9].

However, the SD for the main study could be either larger or smaller than that estimated in the pilot study,

Standard deviation

To have, for example, 95% confidence of obtaining at least the nominal power of the statistical test in the main study, the researcher must base the sample size of the main study on the upper 95% one-sided CL for the SD from the pilot study. Hence, in relation to the example in the previous section, and assuming a pilot study of n = 20 and a nominal power of 80%, it can be calculated that to be 95% confident of achieving at least the nominal power of the statistical test in the main study, an SD

Conditional power of a study

It is possible to calculate the achieved (conditional) power of a main study, based on various values of IFs, if the true SDs were equal to the upper 95% CL, but the sample size was determined according to the point estimate of the SD from a pilot study. The relevant formula is derived as follows:n2(z1α/2+z1β)2(IFs×s)2|μ1μ2|2=2(z1α/2+z1β)2s2|μ1μ2|2(z1α/2+z1β)2(IFs×s)2=(z1α/2+z1β)2s2(z1α/2+z1β)(IFs×s)=(z1α/2+z1β)sz1β×IFs=z1α/2(1IFs)+z1βz1β=z1α/2(IFs11)+z1β×IFs1

Considering the combined sample size

Kieser and Wassmer [13] argued that the size of a pilot study should be chosen so as to minimize the combined sample size of the pilot and main studies. Therefore, one approach to determining the size of the pilot study is to calculate the combined sample size required for two different sizes of pilot study. If there is little difference in the combined sample size, the smaller pilot study makes more sense as the correspondingly larger main study will provide a more precise estimate of the true

Discussion

A small pilot study can potentially lead to a trial that is severely underpowered to detect the specified effect, if the SD in the main study proves to be at or near the upper CL. Thus, if an SD of 20 was derived from a pilot study of n = 20, the SD in the main study could plausibly be as large as 27.41 (based on an upper 95% one-sided CL). An RCT based on the SD estimate of 20 could thereby underrecruit by about half in the main study, only achieving approximately 54% statistical power.

Conclusion

This study provides evidence that small pilot studies are liable to produce imprecise estimates of the SD for a power calculation. Therefore, from a statistical perspective, the size of a pilot study should be calculated in relation to the desired level of confidence for the SD and the chosen power and significance level of the analysis in the main study; at a high level of confidence, a pilot study of at least n = 50 is advisable in many circumstances.

While in general, trialists should seek to

References (18)

  • J. Wittes

    Sample size calculations for randomized controlled trials

    Epidemiol Rev

    (2002)
  • A.J. Vickers

    Underpowering in randomized trials reporting a sample size calculation

    J Clin Epidemiol

    (2003)
  • G.A. Lancaster et al.

    Design and analysis of pilot studies: recommendations for good practice

    J Eval Clin Pract

    (2004)
  • S.A. Julious

    Sample size of 12 per group rule of thumb for a pilot study

    Pharm Stat

    (2005)
  • M.A. Birkett et al.

    Internal pilot studies for estimating sample size

    Stat Med

    (1994)
  • J. Wittes et al.

    The role of internal pilot studies in increasing the efficiency of clinical trials

    Stat Med

    (1990)
  • D.M. Zucker et al.

    Internal pilot studies II: comparison of various procedures

    Stat Med

    (1999)
  • T. Friede et al.

    Sample size recalculation in internal pilot study designs: a review

    Biom J

    (2006)
  • D. Machin et al.

    Sample size tables for clinical studies

    (2009)
There are more references available in the full text version of this article.

Cited by (554)

View all citing articles on Scopus
View full text