Introduction A coeliac disease (CD) diagnosis is likely in children with levels of tissue transglutaminase autoantibodies (anti-TG2) >10 times the upper reference value, whereas children with lower anti-TG2 levels need an intestinal biopsy to confirm or rule out CD. A blood sample is easier to obtain than an intestinal biopsy sample, and stabilised blood is suitable for routine diagnostics because transcript levels are preserved at sampling. Therefore, we investigated gene expression in stabilised whole blood to explore the possibility of gene expression-based diagnostics for the diagnosis and follow-up of CD.
Design We performed RNA sequencing of stabilised whole blood from active CD cases (n=10), non-CD cases (n=10), and treated CD cases on a gluten-free diet (n=10) to identify diagnostic CD biomarkers and pathways involved in CD pathogenesis.
Results No single gene was differentially expressed between the sample groups. However, by using gene set enrichment analysis (GSEA), significantly differentially expressed pathways were identified in active CD, and these pathways involved the inflammatory response, negative regulation of viral replication, translation, as well as cell proliferation, differentiation, migration, and survival. The results indicate that there are differences in pathway regulation in CD, which could be used for diagnostic purposes. Comparison between GSEA results based on stabilised blood with GSEA results based on small intestinal biopsies revealed that type I interferon response, defence response to virus, and negative regulation of viral replication were identified as pathways common to both tissues.
Conclusions Stabilised whole blood is not a suitable sample for clinical diagnostics of CD based on single genes. However, diagnostics based on a pathway-focused gene expression panel may be feasible, but requires further investigation.
- coeliac disease
- gene expression
- molecular biology
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
What is already known about this subject?
Paediatric coeliac disease (CD) diagnosis still requires sampling and evaluation of small intestinal biopsies in many cases.
Gene expression in small intestinal biopsies differs between patients with active CD and patients with non-CD.
Blood is generally a more accessible sampling material compared with small intestinal biopsies.
Stabilised blood is suitable for routine gene expression-based diagnostics because transcript levels are preserved at sampling.
What are the new findings?
Stabilised whole blood is not a suitable sample for clinical diagnostics of CD based on single genes.
Pathways of potential interest in CD have been identified.
How might it impact on clinical practice in the foreseeable future?
The results provide a potential starting point for the development of pathway-focused gene expression panels for CD.
These findings add to the knowledge of CD pathogenesis.
Small intestinal biopsies have long been an essential part of coeliac disease (CD) diagnostics, but based on the recommendations from the European Society for Paediatric Gastroenterology, Hepatology, and Nutrition in 2012, the intestinal biopsy may be excluded from the diagnostic flow for symptomatic children with high levels (>10 times the upper reference value, URV) of tissue transglutaminase autoantibodies (anti-TG2) if additional requirements are met.1 In symptomatic children with anti-TG2 levels that are 1–10 times the URV, an intestinal biopsy is needed to confirm or rule out CD as a diagnosis. If the biopsy sample is Marsh grade 0 or 1, the diagnosis is unclear and further investigations are recommended.1 In symptomatic IgA-competent patients with anti-TG2 <1 times the URV, a CD diagnosis is less likely, but not ruled out.1 Additional biomarkers would aid in diagnosing CD; we previously investigated gene expression in small intestinal biopsies by RNA sequencing and found differences in gene expression in active CD cases versus non-CD cases that were detectable also in CD cases with no or low-grade intestinal injuries (Marsh 0–1).2 However, blood is a more accessible sampling material than small intestinal biopsies that can be sampled repeatedly to both diagnose and follow-up with patients with CD. RNA sequencing of CD4+ T cells and whole genome microarray transcriptome analysis of peripheral blood mononuclear cells from mainly adult CD cases have identified genes and pathways of interest in CD.3–5 Because stabilised whole blood preserves transcript levels during collection, transport, and storage,6 the use of stabilised whole blood would facilitate the incorporation of gene expression-based diagnostics in clinical practice. We performed RNA sequencing of stabilised whole blood from paediatric patients with active CD, non-CD, and treated CD on a gluten-free diet (GFD) to identify potential CD diagnostic biomarkers and pathways involved in CD pathogenesis.
Children and adolescents (<18 years of age) were referred to Ryhov County Hospital in Jönköping, Sweden with suspected CD or were followed-up after a period on a GFD to verify mucosal recovery. The patients included in this study were enrolled during the years 2009–2014, where biopsy samplings on GFD were still performed as a part of the standard diagnostic process for CD at the hospital. Most patients were referred for small intestinal biopsy due to elevated anti-TG2 regardless of symptoms. Some children with anti-TG2 levels below the URV on a gluten-containing diet (with or without selective IgA deficiency) were referred for biopsy to exclude CD. The patients provided written consent before blood and duodenal biopsy specimens were collected.
Two study groups (table 1) were defined based on subjects on a gluten-containing diet: (1) group CDM3 with a CD diagnosis (histopathologic assessment Marsh 3B–3C and anti-TG2 >7 U/mL), (2) group M0 cleared from CD (Marsh 0 and anti-TG2 ≤7 U/mL). A third study group (table 1) was defined based on CD subjects at follow-up on a GFD: (3) Group CDM0 (Marsh 0 and anti-TG2 ≤7 U/mL). Whole blood RNA samples were sequenced from all study subjects in all groups. Additionally, RNA samples from small intestinal biopsies from subjects in groups M0 and CDM3 were sequenced. These subjects were included as subgroups in a previous study involving RNA sequencing of small intestinal biopsies.2 Based on the previous analysis, one subject in group M0 was more similar to Marsh 3 subjects, and therefore, that subject was excluded from all analyses.
Sample collection and processing
Sera were collected using blood tubes containing polymer gel and clot activator (Becton, Dickinson and Company, Franklin Lakes, New Jersey, USA). Serum levels of anti-TG2 and IgG antibodies against deamidated gliadin were determined using EliA-kits from Thermo Fisher Scientific (Waltham, Massachusetts, USA) according to Bragde et al.7 A cut-off of 7 U/mL was used.
Blood for RNA isolation was collected in Tempus Blood RNA tubes (Life Technologies, Carlsbad, California, USA), kept overnight at 4°C, and then stored at −20°C until RNA isolation. Total RNA from stabilised blood was purified using the Tempus Spin RNA Isolation Reagent kit (Life Technologies) according to the manufacturer’s instructions. RNA concentrations were determined using the Qubit 2.0 Fluorometer and the Qubit RNA BR Assay Kit (Thermo Fisher Scientific). RNA integrity was assessed using the Agilent 2100 Bioanalyzer with the Agilent RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, California, USA), according to the manufacturer’s instructions. Small intestinal biopsies were collected and RNA was isolated as described in Bragde et al.2 Blood was also collected in EDTA tubes (Becton, Dickinson and Company), and DNA was extracted from 350 µL of blood using the Biorobot EZ1 and the EZ1 DNA Blood 350 µL Kit according to the manufacturer’s instructions (Qiagen, Hilden, Germany).
The intestinal biopsies from the subjects were histopathologically assessed using the modified Marsh scale8 9 according to the method described in Bragde et al.7 Additional assessments were available for 10 subjects that were included in a previous study.10
Libraries for RNA sequencing of stabilised whole blood were prepared using TruSeq Stranded Total RNA with Ribo-Zero Globin (Illumina, San Diego, California, USA). RNA was sequenced as described previously in Bragde et al.2
The single nucleotide polymorphism rs2187668 was used to type the human leucocyte antigen (HLA) alpha chain DQA1*05 and beta chain DQB1*02 alleles (HLA-DQ2.5) in cis.2 11
RNA sequencing data were aligned and analysed using RStudio V.126.96.36.199 The analysis flow up until differential expression calculation for individual genes was based on Law et al,13 and used functionalities from both limma (V.3.36.2)14 and edgeR (V.3.22.3).15 Sequencing reads were mapped and features (official gene symbols) were counted using RSubread V.1.30.416 and genome build hg38 and annotations (V.85) from Ensembl.17 Entrez ID information and the name of each gene were added using the Bioconductor package Annotationdbi.18 Genes with missing information were excluded from further analyses. Counts were converted to counts per million (cpm). Genes with expression >2 cpm in at least 10 samples were included in subsequent analyses. Multidimensional scaling with distances between samples based on the top 500 genes with the largest SD between all samples was carried out using Glimma19 to visualise sample differences according to the primary condition of interest (Marsh grade) or possible confounding factors. Scaling factors were calculated based on the trimmed mean of the M values. Heteroscedasticity was removed from count data using the voom function in limma. An empirical Bayes moderated t-test, with and without adjustment for multiple testing (5% false discovery rate, FDR20), was used to assess significant differences from zero for each individual contrast; thus, there was no fold-change (FC) cut-off needed. P values <0.05 after adjustment for multiple testing were considered significant.
To identify potential biological processes of interest in CD, gene set enrichment analysis (GSEA) was performed using GSEA software V.3.0 from the Broad Institute,21 22 10 000 gene set permutations, and gene set versions from 1 June 2018 (‘Human_GO_bp_no_GO_iea_symbol.gmt’ and ‘Human_Reactome_June_01_2018_symbol.gmt’). The absolute value of the logarithm of the p value for each gene was multiplied by the direction (sign) of the FC, thus resulting in a ranked list of genes with upregulated genes in the top and downregulated genes in the bottom. The list was used as the input for the GSEA (GSEAPreranked), and analysis output was Gene Ontology (GO) Biological Process terms23 24 or Reactome pathways25 containing genes overrepresented (FDR-adjusted p value <0.075) at the top or bottom of the ranked list. Each GO term or Reactome pathway was accompanied by an enrichment score, reflecting the degree to which the gene set was overrepresented at the top or bottom of the ranked list. The enrichment score was adjusted to account for differences in gene set size and in correlations between gene sets and the expression data set, and the adjusted value was termed normalised enrichment score. The results were visualised using EnrichmentMap V.3.1.026 in Cytoscape V.3.6.127 with an edge similarity cut-off (combined Jaccard and Overlap, 50/50) of 0.375. Clusters were identified and annotated using AutoAnnotate V.1.2, and the Markov clustering algorithm (clusterMaker2 V.1.3.128), and clusters were weighted based on similarity coefficients that were calculated based on overlaps between data sets. The word clouds (WordCloud V.3.1.129) that associated with the clusters were adjusted to clarify context.
Deconvolution was performed using immunoStates30 within MetaIntegrator V.2.0.0 with RNA sequencing data formatted as log2-transformed transcripts per million. The resulting cell ratios were investigated for differences between groups using the Kruskal-Wallis one-way analysis of variance by ranks in Statistica V.13.3 (Statsoft).
To compare the GSEA results based on blood samples with the results based on duodenal biopsies, small intestinal biopsy RNA sequencing data from Bragde et al2 were reanalysed using the same analysis pipeline as the blood data.
Power calculations were performed using ssizeRNA V.188.8.131.52
All results are based on RNA sequencing of stabilised whole blood unless otherwise specified. RNA sequencing resulted in a mean of 24.5 million reads per sample (13.1–29.5 million reads), and a mean of 10.7 million reads per sample (6.4–13.2 million reads) were mapped to exons and summarised to gene counts. A total of 56 303 genes were initially included in the analysis, but after filtering and annotation, 12 290 genes were further analysed.
Data set assessment
Unsupervised clustering suggested that gender might influence gene expression. Twenty-two genes were identified as differentially expressed genes (DEGs) based on gender not considering CD status, and 21 of these associated with the X and Y chromosomes (online supplemental file 1). Five of the genes were identified as pseudogenes, including the autosomal gene. Differential expression and pathway analyses of CD produced similar results with and without controlling for gender. Therefore, to avoid unnecessary stratification, the results were not adjusted for gender.
Differential expression on a gene level
Even without an FC cut-off, no significant DEGs were identified in the CDM3 versus M0, CDM3 versus CDM0, and CDM0 versus M0 comparisons. Results for genes with an absolute FC difference ≥1.5 are presented without FDR correction (p value <0.05; online supplemental file 2).
Differential expression on a gene set level
Differences in gene expression in stabilised whole blood from children with active CD (CDM3), treated CD (CDM0), and non-CD children (M0) were further investigated using GSEA with GO terms or Reactome pathways. Significant differential expression differences of genes associated with GO terms (figure 1, online supplemental file 3) and Reactome pathways (figure 2, online supplemental file 4) were identified in the CDM3 versus M0 and CDM3 versus CDM0 comparisons, but not in the CDM0 versus M0 comparison.
Eleven GO terms, related to type I interferon response and defence response to virus, were associated with higher RNA levels in the CDM3 group based on comparisons of CDM3 versus M0 and CDM3 versus CDM0. Thirteen GO terms related to ‘tube formation’, proliferation, nitric oxide biosynthesis, the p38 mitogen-activated protein kinase cascade, and lipid localisation were specific for the CDM3 versus M0 comparison and associated with higher RNA levels in CDM3. Twenty-four GO terms were specific for the CDM3 versus CDM0 comparison, and eight of these GO terms were associated with lower RNA levels in CDM3 and related to ‘protein targeting to membrane/endoplasmic reticulum (ER)’. The remaining 16 GO terms were associated with higher RNA levels in CDM3 and were related to immune response processes and monocarboxylic acid transport.
Two Reactome pathways were identified in both the CDM3 versus M0 and the CDM3 versus CDM0 comparison. Higher RNA levels in CDM3 were associated with ‘interferon alpha beta signalling’ and lower RNA levels in CDM3 associated with ‘signalling by ERBB4’. Eight Reactome pathways were identified for CDM3 versus M0, with higher RNA levels in CDM3 associated with pathway ‘formation of the beta-catenin:T cell factor (TCF) transactivating complex’, and lower RNA levels in CDM3 associated with seven pathways in the cluster ‘fibroblast growth factor receptors (FGFR) signalling’. Three Reactome pathway clusters were identified for CDM3 versus CDM0, and higher RNA levels in CDM3 were associated with ‘activation of GABA B receptors’ and ‘interferon signalling’, and lower RNA levels in CDM3 were associated with the cluster ‘translational processes’ along with several other pathways.
Gene expression in blood compared with small intestinal biopsies
GO terms identified in both blood and small intestinal GSEA (CDM3 versus M0) were associated with type I interferon response, defence response to virus, and negative regulation of viral replication (online supplemental file 3), and Reactome pathways were associated with interferon signalling and formation of the beta-catenin:TCF transactivating complex (online supplemental file 4).
Abundances of blood cell types inferred from deconvolution of RNA sequencing data indicated no significant differences between the groups (online supplemental file 5; p values 0.17–0.99).
On a per gene level, no significant DEGs in stabilised blood were identified in this study. To identify genes suitable for use as biomarkers, we aimed for an FC >3 or FC <−3. In a previous study of gene expression in duodenal biopsies (CDM3 versus M0) using RNA sequencing, 2.5% of the analysed genes were differentially expressed at this FC level.2 Assuming 10 times fewer DEGs in blood, the analysed number of genes (12 290) should render approximately 25 DEGs at this FC level with group sizes corresponding to a power of 80%. However, no DEGs that withstood adjustment for multiple testing were found, indicating that even fewer DEGs are detectable at this FC level in whole blood using RNA sequencing. DEGs have previously been successfully identified in studies of gene expression based on purified cells from CD blood samples.3–5 32 This discrepancy indicates that RNA stabilised whole blood is not a suitable sampling material to identify differential expression in CD, and thus unsuitable for CD diagnostics based on a single or a few genes.
Based on GSEA, our study groups (table 1) differed significantly with respect to several Reactome pathways and GO terms. In this discussion, we chose to focus on the GO terms and Reactome pathways with the most positive/negative normalised enrichment scores because these would have the most diagnostic potential. The results suggest concomitant regulation by both pro-inflammatory and anti-inflammatory processes. The GO term ‘negative regulation of innate immune response’ was enhanced in active CD when compared with treated CD, and a non-significant increase was also seen in active CD when compared with non-CD. Additionally, expression of genes associated with the Reactome pathway ‘cell surface interactions at the vascular wall’ was lower in active CD when compared with treated CD, and a non-significant decrease was also seen in active CD when compared with non-CD. This pathway describes key interactions between platelets and leukocytes and the endothelium in response to injury. Furthermore, clusters of GO terms/Reactome pathways related to translational processes and targeting of proteins to the membrane/ER were associated more with treated CD than active CD, possibly reflecting post-transcriptional processes in the initiation and regulation of inflammation.33 GO terms and Reactome pathways associated with interferon type I (alpha and beta) were enriched in active CD when compared with both non-CD and treated CD, and significant differences in Reactome pathways/GO terms related to type II interferon (interferon gamma) and the JAK-STAT cascade were observed when active CD was compared with treated CD. The JAK-STAT cascade is highly involved in immune responses, many of the JAK-STAT receptors respond to cytokines,34 and JAK-STAT is an important signal transduction pathway for interferon gamma.35 In CD lesions, there is constitutive activation of the STAT pathway,35 and the JAK-STAT pathway was identified in a transcriptome analysis of CD4+ T cells from CD cases by Quinn et al.3 In this study, Reactome pathways related to the activation of GABA B receptors were identified in active CD when compared with treated CD, and GABA B receptors could be involved in the active immune response because they have been shown to be expressed in neutrophils and seem to be involved in neutrophil migration by acting as chemoattractant receptors.36 Additionally, our results indicated that negative regulation of viral replication and defence response to virus are associated with active CD. This is in agreement with the suggestion that the response to viral infection is a contributing factor to the onset of CD.37 The GO term ‘monocarboxylic acid transport’ was enriched in active CD when compared with treated CD. It is of interest to note that levels of faecal short chain fatty acids are altered in children with CD,38 and that short chain fatty acids have been proposed to have anti-inflammatory effects by influencing regulatory T cells.39 Reactome pathway cluster ‘FGFR signalling’, and GO term cluster ‘tube formation’ and the GO term ‘negative regulation of fibroblast proliferation’ were associated with active CD when compared with non-CD, indicating that processes involving cell proliferation, differentiation, migration, and survival could be altered in CD.
Biological processes identified by overlapping GSEA results from blood and small intestinal biopsies included response to type I interferon, defence response to virus, and formation of the beta-catenin:TCF transactivating complex. This indicates that, in both tissues, an immune response similar to the response found against virus infections and activation of transcription through the WNT pathway could be involved in CD.
Even if no single gene biomarker was identified, and the identified pathways were general and associated primarily with inflammation, further studies could still reveal gene expression profiles unique for CD within this biological context. In a study by Tuller et al, where gene expression and protein–protein interactions in six autoimmune diseases were analysed, pathways in common for most or all diseases were identified.40 However, there were also evidence of differential contribution of inflammation-related and more general pathways (eg, interferon and apoptosis), and also that in different diseases the same pathway (eg, apoptosis) was regulated by different mechanisms.40
Any conclusions as to whether the results from this study can be of use for diagnostic purposes cannot be drawn due to the small groups. For this purpose larger studies are needed, including patients at different stages of the disease, and should probably be based on a different sampling material.
In conclusion, our study indicates that the use of stabilised whole blood for identification of single gene expression biomarkers in patients with CD is not optimal. However, by analysing gene expression on a pathway level, it is possible to identify significant differences in expression between active CD, treated CD, and non-CD cases. Our results indicate concomitant regulation of both pro-inflammatory and anti-inflammatory processes in CD, and highlight processes such as transport, and cell proliferation, differentiation, migration, and survival. Parallel analysis of RNA sequencing results from small intestinal biopsies and stabilised blood indicate that factors involved in the defence against viral infections are activated in both tissue types in CD. With regard to CD diagnostics, results from GO term and Reactome pathway analyses could be used to develop pathway-focused gene expression panels that could be used to predict disease status in single samples, and should be investigated further. A good starting point would be to study genes involved in pathways relating to defence against viral infections.
The authors acknowledge support from Science for Life Laboratory, the Knut and Alice Wallenberg Foundation, the National Genomics Infrastructure funded by the Swedish Research Council, and Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure. The authors also wish to thank all participating patients, Research Nurse Inga-Lena Hultman at the Department of Paediatrics, and the staff at the Endoscopy Department and the Surgical Department at Ryhov County Hospital, Jönköping, Sweden.
Contributors HGB and JS conceived of the study with support from UJ. HGB, JS and UJ planned the study with support from MF and EG. HGB, UJ and JS collected material for the study. HGB carried out the laboratory work, except for antibody testing, histopathologic assessment, and RNA sequencing. HGB performed the data analysis and interpretation in discussion with JS and with support from UJ, MF and EG. HGB drafted the manuscript, with help from all other authors, especially JS. All authors read and approved the final manuscript.
Funding Financial support was provided by Futurum—the Academy for Health and Care, Region Jönköping County, and by the Medical Research Council of Southeast Sweden.
Competing interests None declared.
Patient consent for publication Not required.
Ethics approval The protocol was approved by the Regional Ethical Review Board in Linköping, Sweden (2011/239-31)
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available upon reasonable request. Count data and relevant metadata are available from the corresponding author upon request.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.