Resources for stepped-wedge CRTs

In a SW-CRT, not only is there correlation between observations within the same cluster, but the intervention effect is also confounded with changes in the outcome over time because the intervention condition is on average later in time than the control condition. This has implications for the design, analysis and reporting.

This page provides resources specific to conducting a SW-CRT and has some information for the related crossover CRT design, but the resources on CRTs may also be useful.

See What is a SW-CRT? for several recent publications giving an overview of issues in the design, conduct, analysis and reporting of SW-CRTs.

Resources below are separated into sections on Design, Analysis and Reporting.

Design: Introduction

This page provides resources to help with designing a SW-CRT. It may also be useful to look at the resources for designing CRTs, but here we focus on resources specific to conducting a SW-CRT. In particular, we cover: general information, pilot and feasibility studies, ethics, randomisation, sample size and selecting a design, expected inter-period correlation coefficient values, and the crossover design.

Many aspects of design depend on the choice of analysis, so you may want to refer to resources for analysing SW-CRTs.

Design: General information
Design: Pilot and feasibility studies

Pilot and feasibility studies are often useful before conducting a main SW-CRT. This is because of the added complexity in SW-CRTs compared to individually randomised trials.

Design: Ethics

The following papers discuss the ethical considerations when conducting a SW-CRT.

Design: Randomisation

All methods of randomisation that can be used for CRTs can also be used for SW-CRT. See Resources for designing CRTs.

This article discusses why it is important to randomise in a SW-CRT:

This article gives an example of using covariate constrained randomisation in a SW-CRT (see Resources for designing CRTs for an explanation of covariate constrained randomisation):

Design: Sample size and selecting a design

To calculate the sample size required for a SW-CRT, you must make some assumptions about what your data will look like. In addition to the cluster size and ICC required for the sample size calculation of a parallel CRT, SW-CRTs require assumptions about changes in the correlation in your data over time. SW-CRTs cover a range of designs and the sample size is affected by including a baseline period, the number of sequences, and whether or not observations are collected in all clusters in all periods. Specific methods have been proposed to estimate the sample size and compare different designs in the following situations:

Assuming equal correlation within clusters across all periods with a cross-sectional design

(Note, this is a strong assumption that will usually be inappropriate, but the methods are simple. These methods may underestimate the required sample size):

Assuming other correlation structures and accounting for cohort designs
Using simulation studies

(This is the most flexible method that can incorporate any design or correlation structures):

Designs with transition periods
Designing SW-CRTs with 3 arms
Unequal cluster size
Within-period/ vertical analysis
Designs with interim analysis
Sample size re-estimation
Design: Tools and software

Various practical resources have been produced for calculating sample size in SW-CRTs:

Excel tool when assuming equal correlation within clusters across all periods
Stata command “steppedwedge” when assuming equal correlation within clusters across all periods
R package “SWSamp” for using simulation based sample size calculations
Online tool for comparing designs
R package “swCRTdesign” for the design and analysis of SW-CRTs
Multi-arm SW-CRT design R code
Design: Expected inter-period correlation coefficient and autocorrelation values

Sample size calculations for SW-CRTs require postulating a value for the inter-period correlation coefficient and a correlation structure for observations over time. In the following paper, Martin et al. have published some values:

Design: Crossover CRTs

SW-CRTs are a modification of the crossover design. Recommendations about the design and sample size of cluster crossover trials can be found here:

Analysis: Introduction

There are three main considerations in the analysis of SW-CRTS beyond those of the parallel, individually randomised trial:

  1. Accounting for clustering. Like other CRTs, observations in the same cluster are likely to be more similar to one another than observations in different clusters and this needs to be accounted for in the analysis. Methods include using generalised linear mixed effect models, generalised estimating equations (GEE), and using cluster summary analyses (see resources for analysing CRTs for more information).

  2. Accounting for changes in the outcome over time. In a SW-CRT, the intervention condition observations are on average later in time than the control observations, so the intervention effect is confounded with time and the analysis must account for this, either by adjusting for time or conditioning on time (known as a vertical analysis).

  3. Accounting for changes in correlation over time. Observations collected close together in time are likely to be more similar to one another than observations collected further apart in time. This can be accounted for using generalised linear mixed effect models, or generalised estimating equations (GEE).

SW-CRTs provide information about the intervention from two types of comparisons. Most analyses combine both of these comparisons:

  • Horizontal, or within-cluster comparisons compare observations in the control condition to observations in the intervention condition within each cluster. These comparisons are confounded with time and the amount of information they can provide depends on the correlations over time within the clusters.

  • Vertical, or within-period comparisons compare observations in the control condition to observations in the intervention condition within each period of the study. Each of these comparisons is a randomised comparison and is not confounded with time; they are like CRTs contained within the SW-CRT.

This page provides resources to help with analysing a SW-CRT. It may also be useful to look at the resources for analysing CRTs, but here we focus on SW-CRTs. In particular, we present some general information, followed by resources for vertical analysis, generalised linear mixed models, generalised estimating equations, and permutation tests, and special considerations for time-to-event outcomes, modelling different aspects of the intervention effect, number of clusters required, causal inference methods, and the crossover design.

Analysis: General information

The following references provide general information on considerations for selecting an analysis method:

Analysis: Vertical analysis

Some researchers have suggested using an analysis method that only utilises the vertical comparisons because these comparisons are not confounded with time. Combining the estimates from each vertical comparison must account for the correlation of observations in each period. The following references provide details of how this can be done:

Analysis: Generalised linear mixed models

Generalised linear mixed models combine vertical and horizontal comparisons. The following papers provide information on generalised linear mixed models for stepped wedge trials:

Analysis: Generalised estimating equations

The following paper provides information on generalised estimating equations for stepped wedge trials:

Analysis: Permutation tests

Permutation methods can be used for SW-CRTs. The following provides an introduction to permutation tests:

The following is an R package, “permute”, for generating permutations in R:

The following is a stata command, “swpermute”, for conducting permutation tests for SW-CRTs:

The papers below provide details on permutation methods for SW-CRTs:

Analysis: Time to event outcomes
Analysis: Modelling different aspects of the intervention effect

The SW-CRT design can allow modelling of a changing intervention effect:

Analysis: Number of clusters required
Analysis: Causal inference

As with CRTs, SW-CRT it is possible to estimate a complier average causal effect:

Analysis: Crossover CRTs

SW-CRTs are a modification of the crossover design. Methods for the analysis of cross-over CRTs are described in the following:

Reporting: Introduction

Adequate reporting of the trial results is essential to ensure transparency and reproducibility of findings. This page provides resources to help with the reporting of SW-CRTs.

Reporting: Guidelines

The Consolidated Standards of Reporting Trials (CONSORT) statement includes a checklist of items that should be included in the report of individually randomised trials. An extension for CRTs is available (see resources for the reporting of CRTs). There are many items in these statements that are applicable for a SW-CRT, and should be adhered to. The first discussion of reporting guidelines and the need for a set of recommendations for SW-CRTs can be found in the following paper:

An extension to the CONSORT guidelines for SW-CRTs has recently been published. The protocol for this extension, and the explanation and elaboration paper can be found at:

Some of the key criteria for reporting are listed below:

Trial design

Description and schematic of trial design including definition of cluster, number of sequences, number of clusters randomised to each sequence, number of periods, and duration of time between each step. It should be reported whether the participants assessed in different periods from the same cluster are the same people, different people, or a mixture.

Sample size

The method of calculation and relevant parameters should be reported with sufficient detail so that the calculation is replicable. Any assumptions made about the correlations between outcomes of participants within the same cluster should be noted.

Methods of analysis

Statistical methods used to compare treatment conditions for primary and secondary outcomes including how time effects, clustering and repeated measures were taken into account.


Dates defining the recruitment and follow-up for participants, steps, set up of the intervention and deviations from planned dates.


For each primary and secondary outcome, results for each treatment condition, and the estimated effect size and its precision (such as 95% confidence interval); any correlations and time effects estimated in the analysis.

The SW-CRT contains many unique design features. An accurate trial report should include the reporting of each of these key concepts, which can be found in detail below:

Effect of time on the design

Detailed description In a SW-CRT, clusters are randomised to different sequences which dictate at what date they will initiate the intervention. Observations collected under the control condition are therefore, on average, from an earlier calendar time than observations collected under the intervention condition.

Why this is important Changes external to the trial may create underlying secular trends. In addition, where participants are repeatedly assessed over the duration of the trial, participants might get sicker over the study. This means that time is a potential confounder. Analysis should allow for the confounding effect of time.

Recruitment of participants

Detailed description SW-CRTs make a series of measurements over time within each cluster. These repeated measurements can be on the same participants, different participants at each measurement, or a mixture of the same and different participants.

Why this is important Analysis (and consequently sample size calculations) should allow for the fact that data are not independent (i.e. repeated measures from the same cluster and possibly repeated measures on the same participants).

Sampling of observations

Detailed description SW-CRTs can take a complete enumeration of the cluster or take a random sample of individuals from the cluster. Furthermore, participants might be continuously recruited into the trial as they present; or all participants might be recruited at the beginning of the trial. Some trials don’t recruit participants at all, but rather use their data on outcomes only.

Why this is important Information on how observations were sampled is important to elicit risk of bias. Studies which take a complete enumeration are at lower risk of bias; as too are those studies which recruit all participants at a fixed point in time before randomisation has occurred; whereas those which continuously recruit are at higher risk of bias (due to potential for identification and recruitment biases).

Duration of exposure

Detailed description In SW-CRTs some or all of the clusters will be exposed to both the control and intervention condition. Individuals can either have a relatively short exposure to the intervention (surgical intervention) or long exposure (change in care home policy). Individuals might only be exposed to one of the treatment conditions, or both.

Why this is important Where duration of exposure is short it is unlikely that individuals will be exposed to both the control and intervention condition. Where the duration of exposure is long, it may be possible to sample the same individuals under the control condition and later under the intervention condition. In trials with long exposure and delayed assessment of outcome this could mean that participants recruited under the control condition become exposed to the intervention.

Continuous or discrete time measurements

Detailed description Observations may be accrued continuously in time (e.g., as patients present to an emergency department and provide measurements after a follow-up period); or in discrete time (e.g., a survey questionnaire may be implemented at several discrete points in time).

Why this is important Where observations are accrued in continuous time, time will be measured as a continuous variable; where observations are accrued in discrete time, time will be measured as a categorical variable. This can have implications for choice of analysis.

Reporting: Current standard

Previous systematic reviews have highlighted the poor level of reporting in SW-CRTs – particularly in the area of sample size, analysis methods, and ethical conduct. The below resources indicate the quality of reporting in specific areas.