Article Text

Download PDFPDF

Gut microbiome identifies risk for colorectal polyps
  1. Ezzat Dadkhah1,
  2. Masoumeh Sikaroodi1,
  3. Louis Korman2,
  4. Robert Hardi3,
  5. Jeffrey Baybick2,
  6. David Hanzel4,
  7. Gregory Kuehn5,
  8. Thomas Kuehn5,
  9. Patrick M Gillevet1
  1. 1Microbiome Analysis Center, George Mason University, Manassas, Virginia, USA
  2. 2Capital Digestive Care, Chevy Chase, Maryland, USA
  3. 3Capitol Research, Bethesda, Maryland, USA
  4. 4Naked Biome, San Francisco, California, USA
  5. 5Metabiomics, Aurora, Colorado, USA
  1. Correspondence to Dr Patrick M Gillevet; pgilleve{at}


Objective To characterise the gut microbiome in subjects with and without polyps and evaluate the potential of the microbiome as a non-invasive biomarker to screen for risk of colorectal cancer (CRC).

Design Presurgery rectal swab, home collected stool, and sigmoid biopsy samples were obtained from 231 subjects undergoing screening or surveillance colonoscopy. 16S rRNA analysis was performed on 552 samples (231 rectal swab, 183 stool, 138 biopsy) and operational taxonomic units (OTU) were identified using UPARSE. Non-parametric statistical methods were used to identify OTUs that were significantly different between subjects with and without polyps. These informative OTUs were then used to build classifiers to predict the presence of polyps using advanced machine learning models.

Results We obtained clinical data on 218 subjects (87 females, 131 males) of which 193 were White, 21 African-American, and 4 Asian-American. Colonoscopy detected polyps in 56% of subjects. Modelling of the non-invasive home stool samples resulted in a classification accuracy >75% for Naïve Bayes and Neural Network models using informative OTUs. A naïve holdout analysis performed on home stool samples resulted in an average false negative rate of 11.5% for the Naïve Bayes and Neural Network models, which was reduced to 5% when the two models were combined.

Conclusion Gut microbiome analysis combined with advanced machine learning represents a promising approach to screen patients for the presence of polyps, with the potential to optimise the use of colonoscopy, reduce morbidity and mortality associated with CRC, and reduce associated healthcare costs.

  • microbiome
  • colorectal cancer
  • polyp
  • biopsy
  • stool
  • sequencing
  • machine learning
  • classification
  • risk assessment

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from


  • Presented at This work has been presented in part at Digestive Disease Week, 6–9 May 2017, Chicago, IL, USA. Gastroenterology. 2017 Apr;152 (5, Suppl 1):S152. DOI:

  • Contributors LK, RH, JB, DH, GK, TK, and PMG planned the study. LK, RH, and JB conducted the clinical study. MS and PMG collected the data. ED and PMG interpreted the data and drafted the manuscript. ED and PMG critically revised the manuscript for intellectual content. ED, MS, LK, RH, JB, DH, GK, TK, and PMG approved the final draft submitted.

  • Funding Research reported in this publication was supported in part by Metabiomics.

  • Competing interests GK, TK, and PMG have founders’ stock and LK, RH, and JB have stock options in Metabiomics.

  • Patient consent for publication Not required.

  • Ethics approval The studies described in this manuscript were approved by Chesapeake IRB (Columbia, MD, USA), protocol Metabiomics MB-01, Metabiomics Neoplasia Clinical Research Study (Pro00008950).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement The data sets analysed during the current study are available from the corresponding author on reasonable request and will be submitted to the Sequence Read Archive.

  • Author note The data for this manuscript has been deposited in Genbank. The accession number to the manuscript is PRJNA534511.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.