The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency

doi:10.1016/j.jclinepi.2011.07.011

Journal of Clinical Epidemiology

Volume 65, Issue 3, March 2012, Pages 301-308

https://doi.org/10.1016/j.jclinepi.2011.07.011 Get rights and content

Abstract

Objective

To investigate methods to determine the size of a pilot study to inform a power calculation for a randomized controlled trial (RCT) using an interval/ratio outcome measure.

Study Design

Calculations based on confidence intervals (CIs) for the sample standard deviation (SD).

Results

Based on CIs for the sample SD, methods are demonstrated whereby (1) the observed SD can be adjusted to secure the desired level of statistical power in the main study with a specified level of confidence; (2) the sample for the main study, if calculated using the observed SD, can be adjusted, again to obtain the desired level of statistical power in the main study; (3) the power of the main study can be calculated for the situation in which the SD in the pilot study proves to be an underestimate of the true SD; and (4) an “efficient” pilot size can be determined to minimize the combined size of the pilot and main RCT.

Conclusion

Trialists should calculate the appropriate size of a pilot study, just as they should the size of the main RCT, taking into account the twin needs to demonstrate efficiency in terms of recruitment and to produce precise estimates of treatment effect.

Introduction

What is new?

•
Small pilot studies may provide imprecise estimates of the standard deviation (SD), and resulting power calculations for the main study may be correspondingly imprecise, but guidance on the appropriate size of a pilot study is sparse.
•
Basing the power calculation on a value at the upper confidence limit for the SD can provide reassurance that the main study will have the desired level of statistical power.
•
An inflation factor can be used to calculate this adjusted value for the SD, in relation to a pilot study of a given size, and can also be used to determine an adjusted sample size.
•
The size of a pilot study for a randomized controlled trial using an interval or ratio outcome measure should be determined through a calculation based on the precision of the estimate of the SD.

A sample size calculation for a randomized controlled trial (RCT) is undertaken to estimate the minimum number of participants required to detect as significant a prespecified effect, with a stated level of statistical power and at a chosen significance level [1]. Here, the significance level equates to the risk of incorrectly rejecting the null hypothesis (a type 1 error), and power is the probability of detecting as statistically significant an effect of a specified magnitude, if it exists; this is equivalent to the probability of avoiding a type 2 error.

Where the outcome variable of interest is an interval/ratio scale and the effect in question is a mean difference, the sample size calculation depends in part on the value of the standard deviation (SD) of the outcome variable in the main RCT. This is unknown and often estimated by the SD from a pilot study; this process is equivalent to the estimation of a population parameter. However, given that the pilot SD is a random variable, it may be an under- or overestimate of the SD in the main RCT. Accordingly, an RCT may turn out to be under- or overpowered to detect the specified effect, owing to the SD of data in the trial—on which the hypothesis test is based—being respectively larger or smaller than the value used in the prior sample size calculation. Under- or overestimation of the SD in the main RCT may be for two reasons. First, the estimate used in the calculation may not be appropriate for the clinical population in which the trial is conducted (e.g., it was derived from a previous study of patients whose age, chronicity, or symptom severity differed from that of the patients in the RCT). That is, the SD used in the sample size calculation may be biased (systematic error). Alternatively, as a random variable, the SD used in the calculation may have under- or overestimated the SD in the main RCT simply through sampling fluctuation (random error).

A pilot study can help to remedy the problem of bias in the estimate of the SD as it can be conducted on the same clinical population as will be included in the subsequent RCT. However, the estimate of the SD from a pilot study may still under- or overestimate the SD in the main RCT through random error. The more pressing concern is the possibility of underestimation of the SD, with the consequence of underestimation of the sample size for the main RCT. This appears to be a common phenomenon [2] and prevents clear conclusions from being drawn from the individual RCTs concerned. It is, therefore, important that a pilot study provides an acceptably precise estimate of the SD so as to reduce the likelihood that the trial is underpowered to detect the prespecified clinical difference. In essence, an acceptably precise estimate of the SD requires the pilot study itself to be of sufficient size. It has been suggested that n = 30 is an acceptable size for a pilot study [3]. With specific reference to estimates of the SD, Julious [4] proposes at least n = 12 per group, equivalent to n = 24 for a traditional two-group study, a figure similar to that proposed by other authors [5], [6]. However, there is otherwise little in the way of specific guidelines on the appropriate size of a pilot study.

The aim of this article is twofold: (1) to guide trialists, at the developmental stage of their research, as to the appropriate size of a pilot study to gain sufficiently precise estimates of the true SD and (2) to help inform trialists, post-pilot, on how to adjust their estimate of the SD, for purposes of the sample size calculation of the main trial, to be confident of not underpowering their main study. Our focus is on the situation in which a prior pilot study is conducted independent of the main RCT, rather than on that in which an internal pilot study is performed; that is, where the required sample size is recalculated on the basis of an estimate of the SD derived from the first patients recruited to the main RCT [5], [6], [7], [8].

Section snippets

Illustrative example

Suppose an RCT is being designed to detect a mean difference in systolic blood pressure of at least 8 mm Hg between two treatment groups. A pilot study is conducted and provides an estimated pooled SD on this scale of 20 mm Hg. Based on this estimate—and assuming 80% power and a 5% two-tailed significance level—100 participants per group would be needed for the analysis of the main study [9].

However, the SD for the main study could be either larger or smaller than that estimated in the pilot study,

Standard deviation

To have, for example, 95% confidence of obtaining at least the nominal power of the statistical test in the main study, the researcher must base the sample size of the main study on the upper 95% one-sided CL for the SD from the pilot study. Hence, in relation to the example in the previous section, and assuming a pilot study of n = 20 and a nominal power of 80%, it can be calculated that to be 95% confident of achieving at least the nominal power of the statistical test in the main study, an SD

Conditional power of a study

It is possible to calculate the achieved (conditional) power of a main study, based on various values of IF_s, if the true SDs were equal to the upper 95% CL, but the sample size was determined according to the point estimate of the SD from a pilot study. The relevant formula is derived as follows: $n \approx \frac{2 {(z_{1 - α / 2} + z_{1 - β^{'}})}^{2} {({IF}_{s} \times s)}^{2}}{{| μ_{1} - μ_{2} |}^{2}} = \frac{2 {(z_{1 - α / 2} + z_{1 - β})}^{2} s^{2}}{{| μ_{1} - μ_{2} |}^{2}}$ $\Rightarrow {(z_{1 - α / 2} + z_{1 - β^{'}})}^{2} {({IF}_{s} \times s)}^{2} = {(z_{1 - α / 2} + z_{1 - β})}^{2} s^{2}$ $\Rightarrow (z_{1 - α / 2} + z_{1 - β^{'}}) ({IF}_{s} \times s) = (z_{1 - α / 2} + z_{1 - β}) s$ $\Rightarrow z_{1 - β^{'}} \times {IF}_{s} = z_{1 - α / 2} (1 - {IF}_{s}) + z_{1 - β}$ $\Rightarrow z_{1 - β^{'}} = z_{1 - α / 2} ({IF}_{s}^{- 1} - 1) + z_{1 - β} \times {IF}_{s}^{- 1}$

Considering the combined sample size

Kieser and Wassmer [13] argued that the size of a pilot study should be chosen so as to minimize the combined sample size of the pilot and main studies. Therefore, one approach to determining the size of the pilot study is to calculate the combined sample size required for two different sizes of pilot study. If there is little difference in the combined sample size, the smaller pilot study makes more sense as the correspondingly larger main study will provide a more precise estimate of the true

Discussion

A small pilot study can potentially lead to a trial that is severely underpowered to detect the specified effect, if the SD in the main study proves to be at or near the upper CL. Thus, if an SD of 20 was derived from a pilot study of n = 20, the SD in the main study could plausibly be as large as 27.41 (based on an upper 95% one-sided CL). An RCT based on the SD estimate of 20 could thereby underrecruit by about half in the main study, only achieving approximately 54% statistical power.

Conclusion

This study provides evidence that small pilot studies are liable to produce imprecise estimates of the SD for a power calculation. Therefore, from a statistical perspective, the size of a pilot study should be calculated in relation to the desired level of confidence for the SD and the chosen power and significance level of the analysis in the main study; at a high level of confidence, a pilot study of at least n = 50 is advisable in many circumstances.

While in general, trialists should seek to

References (18)

J. Wittes
Sample size calculations for randomized controlled trials
Epidemiol Rev
(2002)
A.J. Vickers
Underpowering in randomized trials reporting a sample size calculation
J Clin Epidemiol
(2003)
G.A. Lancaster et al.
Design and analysis of pilot studies: recommendations for good practice
J Eval Clin Pract
(2004)
S.A. Julious
Sample size of 12 per group rule of thumb for a pilot study
Pharm Stat
(2005)
M.A. Birkett et al.
Internal pilot studies for estimating sample size
Stat Med
(1994)
J. Wittes et al.
The role of internal pilot studies in increasing the efficiency of clinical trials
Stat Med
(1990)
D.M. Zucker et al.
Internal pilot studies II: comparison of various procedures
Stat Med
(1999)
T. Friede et al.
Sample size recalculation in internal pilot study designs: a review
Biom J
(2006)
D. Machin et al.
Sample size tables for clinical studies
(2009)

There are more references available in the full text version of this article.

Cited by (554)

Beyond conventions: Unravelling perceived value's role in shaping digital-only banks' adoption
2024, Technological Forecasting and Social Change
Advancements in FinTech and Industry 4.0 have made digital financial services more convenient and cost-effective, leading customers to expect them to be fun and offer higher perceived value. Therefore, this research investigates the determinants of digital-only banks' perceived value and its mediating role on the relationship between behavioural intention determinants and the intention to adopt digital-only banks. The data was collected through an online survey conducted among Klang Valley residents. Stratified random sampling was used to gather the data and structural equation modelling (SEM) was applied for the data analysis. The findings revealed that perceived value is a significant predictor of the intention to adopt digital-only banks. Furthermore, perceived convenience, economic efficiency, functional risk, trust, and environmental concern determined the perceived value of digital-only banks. However, security risk, critical mass, and number of services were non-significant predictors of perceived value. Notably, perceived value plays a significant mediating role between certain determinants and adoption intention. This research adds value by revealing multifaceted determinants of perceived value and incorporating environmental concern that highlights the significance of green perceived value in shaping adoption intention. Furthermore, it provides empirical results on a research gap related to digital-only banks' perceived value mediating effect on the intention to adopt these banks. Finally, the study can assist the practitioners of digital-only banks in attracting and retaining users in this modern competitive banking and financial market.
Methodological approach to obtain key attributes affecting the adoption of plug-in hybrid electric vehicle
2024, Case Studies on Transport Policy
The present study demonstrates a methodological approach to prioritize a key set of attributes influencing consumer perception towards Plug-in Hybrid Electric Vehicles (PHEVs) in a typical Indian context. Based on the literature search, an exclusive set of 22 attributes influencing PHEV adoption were selected and conventional car owners' perception towards these attributes were collected from two Indian megacities, namely Delhi and Kolkata. Initially, Kruskal-Wallis H-test was used to investigate heterogeneity in consumer perception towards PHEV-related attributes across different population subgroups. The heterogeneity study revealed a significant difference in perception for several attributes across the two cities. Subsequently, Exploratory Factor Analysis (EFA) was used to identify a set of latent factors influencing PHEV choice for both cities. For ranking of attributes within each latent factor, Grey Relation Analysis (GRA), was employed. Based on the EFA and GRA results, purchase cost, safety, air conditioning, battery warranty, public charging availability, battery recharging time, and tailpipe emission are identified as key attributes affecting PHEV adoption. Such findings could guide the car manufacturers and the government to lay an added emphasis on the priority attributes to enhance the appeal of PHEV as a mode among Indian consumers.
Biopsychosocial complexity in patients scheduled for elective TKA surgery: A feasibility pilot study with the INTERMED self-assessment questionnaire
2024, International Journal of Orthopaedic and Trauma Nursing
Primary aim; to determine the feasibility of implementation of the INTERMED Self-Assessment (IM-SA) in adult patients scheduled for total knee arthroplasty (TKA). Secondary aim; to measure biopsychosocial complexity, referral to psychiatry or psychology in cases of complexity and to gain insight into the relation between biopsychosocial complexity and length of stay (LOS), method of discharge (MOD) and polypharmacy.
A feasibility study was conducted with 76 participants in a general hospital in the Netherlands. Feasibility was determined by the number of completed questionnaires, time spent completing the questionnaire and the attitude of staff and patients towards the IM-SA.
A cut off point ≥19 on the IM-SA was used to determine the prevalence of biopsychosocial complexity. A case file study was performed to check if referral to psychiatry or psychology had taken place.
The Spearman's Rank Correlation Coefficient or Phi was used to determine if there was a relation between biopsychosocial complexity and LOS, MOD and polypharmacy.
All participants completed the IM-SA. The average time spent completing the questionnaire was 11.46 min (SD 5.74). The attitude towards the IM-SA was positive.
The prevalence of biopsychosocial complexity was 11.84%. Referral to psychiatry or psychology did not take place.
There was no relation between complexity and LOS (Spearman's rho (r) = 0.079, p = 0.499, MOD (Phi = 0.169, p = 0.173) and polypharmacy (Phi = 0.007, p = 0.953).
Biopsychosocial complexity can be identified in TKA patients during the pre-operative phase by using the IM-SA. Implementation of the IM-SA in a Dutch general hospital is feasible.
Knee osteoarthritis pendulum therapy: In vivo evaluation and a randomised, single-blind feasibility clinical trial
2024, Journal of Orthopaedic Translation
Exercise is recommended as the first-line management for knee osteoarthritis (KOA); however, it is difficult to determine which specific exercises are more effective. This study aimed to explore the potential mechanism and effectiveness of a leg-swinging exercise practiced in China, called ‘KOA pendulum therapy’ (KOAPT). Intraarticular hydrostatic and dynamic pressure (IHDP) are suggested to partially explain the signs and symptoms of KOA. As such this paper set out to explore this mechanism in vivo in minipigs and in human volunteers alongside a feasibility clinical trial. The objective of this study is 1) to analyze the effect of KOAPT on local mechanical and circulation environment of the knee in experimental animals and healthy volunteers; and 2) to test if it is feasible to run a large sample, randomized/single blind clinical trial.
IHDP of the knee was measured in ten minipigs and ten volunteers (five healthy and five KOA patients). The effect of leg swinging on synovial blood flow and synovial fluid content depletion in minipigs were also measured. Fifty KOA patients were randomly divided into two groups for a feasibility clinical trial. One group performed KOAPT (targeting 1000 swings/leg/day), and the other performed walking exercise (targeting 4000 steps/day) for 12 weeks with 12 weeks of follow-up.
The results showed dynamic intra-articular pressure changes in the knee joint, increases in local blood flow, and depletion of synovial fluid contents during pendulum leg swinging in minipigs. The intra-articular pressure in healthy human knee joints was −11.32 ± 0.21 (cmH₂O), whereas in KOA patients, it was −3.52 ± 0.34 (cmH₂O). Measures were completed by 100% of participants in all groups with 95–98% adherence to training in both groups in the feasibility clinical trial. There were significant decreases in the Oxford knee score in both KOAPT and walking groups after intervention (p < 0.01), but no significant differences between the two groups.
We conclude that KOAPT exhibited potential as an intervention to improve symptoms of KOA possibly through a mechanism of normalising mechanical pressure in the knee; however, optimisation of the method, longer-term intervention and a large sample randomized-single blind clinical trial with a minimal 524 cases are needed to demonstrate whether there is any superior benefit over other exercises.
The research aimed to investigate the effect of an ancient leg-swinging exercise on knee osteoarthritis. A minipig animal model was used to establish the potential mechanism underlying the exercise of knee osteoarthritis pendulum therapy, followed by a randomised, single-blind feasibility clinical trial in comparison with a commonly-practised walking exercise regimen. Based on the results of the feasibility trial, a large sample clinical trial is proposed for future research, in order to develop an effective exercise therapy for KOA.
Commuter and non-commuter preferences for plug-in hybrid electric vehicle: A case study of Delhi and Kolkata, India
2024, Research in Transportation Economics
This paper investigates the commuter and non-commuter preferences for Plug-in Hybrid Electric Vehicles in two Indian metro cities namely Delhi and Kolkata based on a stated preference (SP) framework. The SP data collected from the car-owning population in each city were analyzed using Mixed Logit (ML) models to obtain the commuter and non-commuter respondents’ perceived benefit associated with PHEV operation-specific attributes in terms of willingness to pay (WTP). Thereafter, a sensitivity analysis was carried out to understand the impact of improvement in related attributes on consumer preferences towards PHEVs. The findings suggest an added focus by car manufacturers on fuel cost savings, battery recharging time, battery range, tailpipe emission, and battery warranty to attract commuters. This study also highlights that high purchase cost and lack of public charging stations are key barriers towards PHEV adoption. Based on study results, policy actions such as higher subsidy, increased public charging stations, and public educational and awareness campaigns by Government could play a major role towards wider diffusion of PHEVs in Indian context.
Prehospital optimal shock energy for defibrillation (POSED): A cluster randomised controlled feasibility trial
2024, Resuscitation Plus
We explored the feasibility of a large-scale UK ambulance services trial of optimal defibrillation shock energy for out-of-hospital cardiac arrest. The primary objective of this feasibility study was to establish the number of eligible patients and the number recruited. Secondary outcomes were adherence to allocated treatment and data completeness.
We conducted a three-arm parallel group cluster randomised controlled feasibility study in a single ambulance service in southern England. Adult patients in out-of-hospital cardiac arrest treated for a shockable rhythm were included. Zoll X series defibrillators (clusters) were randomised to deliver 120–150–200 J, 150–200–200 J, or 200–200–200 J shock strategies.
Between March 2022 and February 2023, we randomised 38 eligible patients (120–150–200 J (n = 12), 150–200–200 J (n = 10), 200–200–200 J (n = 16)) to the study. The recruitment rate per cluster was 0.07 per month. The median patient age was 71 years (IQR 59–81 years); 79% were male. Twenty-eight cardiac arrests (74%) occurred in a private residence, 29 (76%) were witnessed and 32 (84%) patients received bystander CPR. Treatment adherence was 93% and completeness of clinical and electrical outcomes was 86%. At 30 days, 3/36 (8.3%) patients survived; we were unable to collect survival outcomes for two patients. Defibrillation data collection became difficult when defibrillators became separated from their allocated vehicles.
We have demonstrated the feasibility of a cluster randomised controlled trial of optimal shock energy for defibrillation in a UK ambulance service. We have identified possible solutions to issues relating to trial design.

View all citing articles on Scopus

View full text

Original ArticleThe size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency

Abstract

Objective

Study Design

Results

Conclusion

Introduction

Section snippets

Illustrative example

Standard deviation

Conditional power of a study

Considering the combined sample size

Discussion

Conclusion

Sample size calculations for randomized controlled trials

Epidemiol Rev

Underpowering in randomized trials reporting a sample size calculation

J Clin Epidemiol

Design and analysis of pilot studies: recommendations for good practice

J Eval Clin Pract

Sample size of 12 per group rule of thumb for a pilot study

Pharm Stat

Internal pilot studies for estimating sample size

Stat Med

The role of internal pilot studies in increasing the efficiency of clinical trials

Stat Med

Internal pilot studies II: comparison of various procedures

Stat Med

Sample size recalculation in internal pilot study designs: a review

Biom J

Sample size tables for clinical studies

Original Article
The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency