Article Text

Download PDFPDF

Risk factors for colorectal cancer significantly vary by anatomic site
  1. Joshua Demb1,
  2. Ashley Earles2,
  3. María Elena Martínez1,3,
  4. Ranier Bustamante2,
  5. Alex K Bryant4,
  6. James D Murphy4,
  7. Lin Liu1,2,3,
  8. Samir Gupta1,5,6
  1. 1Moores Cancer Center, University of California San Diego, La Jolla, California, USA
  2. 2Department of Research, VA San Diego Healthcare System, San Diego, California, USA
  3. 3Department of Family Medicine and Public Health, University of California San Diego, La Jolla, CA, United States
  4. 4Department of Radiation Medicine and Applied Sciences, University of California San Diego, La Jolla, California, USA
  5. 5Veterans Affairs San Diego Healthcare System, San Diego, CA, United States
  6. 6Department of Medicine, Division of Gastroenterology, University of California San Diego, La Jolla, CA, United States
  1. Correspondence to Dr Samir Gupta; s1gupta{at}


Objective To conduct an anatomic site-specific case–control study of candidate colorectal cancer (CRC) risk factors.

Design Case–control study of US veterans with >1 colonoscopy during 1999–2011. Cases had cancer registry-identified CRC at colonoscopy, while controls were CRC free at colonoscopy and within 3 years of colonoscopy. Primary outcome was CRC, stratified by anatomic site: proximal, distal, or rectal. Candidate risk factors included age, sex, race/ethnicity, body mass index, height, diabetes, smoking status, and aspirin exposure summarised by adjusted ORs and 95% CIs.

Results 21 744 CRC cases (n=7017 rectal; n=7039 distal; n=7688 proximal) and 612 646 controls were included. Males had significantly higher odds relative to females for rectal cancer (OR=2.84, 95% CI 2.25 to 3.58) than distal cancer (OR=1.84, 95% CI 1.50 to 2.24). Relative to whites, blacks had significantly lower rectal cancer odds (OR=0.88, 95% CI 0.82 to 0.95), but increased distal (OR=1.27, 95% CI 1.19 to 1.37) and proximal odds (OR=1.62, 95% CI 1.52 to 1.72). Diabetes prevalence was more strongly associated with proximal (OR=1.29, 95% CI 1.22 to 1.36) than distal (OR=1.15, 95% CI 1.08 to 1.22) or rectal cancer (OR=1.12, 95% CI 1.06 to 1.19). Current smoking was more strongly associated with rectal cancer (OR=1.81, 95% CI 1.68 to 1.95) than proximal cancer (OR=1.53, 95% CI 1.43 to 1.65) or distal cancer (OR=1.46, 95% CI 1.35 to 1.57) compared with never smoking. Aspirin use was significantly more strongly associated with reduced rectal cancer odds (OR=0.71, 95% CI 0.67 to 0.76) than distal (OR=0.85, 95% CI 0.81 to 0.90) or proximal (OR=0.91, 95% CI 0.86 to 0.95).

Conclusion Candidate CRC risk factor associations vary significantly by anatomic site. Accounting for site may enable better insights into CRC pathogenesis and cancer control strategies.

  • colorectal cancer
  • epidemiology
  • cancer epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary box

What is already known about this subject?

  • Colorectal cancer (CRC) is the third leading cause of cancer and cancer-related deaths in the world. While many studies have reported on key risk factors associated with CRC risk, such as age, sex, race/ethnicity, body mass index, height, diabetes, smoking status and aspirin exposure, anatomic site-specific differences have not been widely investigated.

What are the new findings?

  • Candidate CRC risk factor associations varied markedly by anatomic subsite. Increased CRC risk for males was most closely associated with rectal cancer. Compared with whites, blacks had reduced risk of rectal cancer, but increased risk of distal and proximal colon cancer. Diabetes prevalence most significantly associated with increased proximal cancer risk compared with distal or rectal cancer. Compared with never smoking, smoking was most significantly associated with rectal cancer compared with proximal or distal cancer. Aspirin was associated with reduced risk for all three subsites, though most strongly associated with reduced risk for rectal cancer.

How might it impact on clinical practice in the foreseeable future?

  • Our findings indicate that risk factor associations in CRC vary by anatomic subsite, which could further contribute to differences in tumour presentation. Learning more about how risk factors are associated with specific CRC sites may enable better insights into CRC pathogenesis, and guide future cancer control strategies.


Colorectal cancer (CRC) is the third leading cause of cancer and cancer-related deaths worldwide.1 CRCs may be divided into three anatomic sites: proximal cancers, generally including cancers of the caecum, ascending colon, hepatic flexure, and transverse colon; distal cancers, including the descending and sigmoid colon; and rectal cancer, where the splenic flexure is variably grouped with either proximal or distal location.2 3 Current research shows that embryologic origins, associated microbial milieu, and tumour characteristics such as mutational signatures and histological features differ by anatomic site.4–6 For example, microsatellite instability is more commonly observed in proximal compared with distal or rectal cancer, and proximal cancers have also been linked more commonly with grade 3/4 cancers and mucinous histology.5 7 8 Though site-specific differences in tumour characteristics are likely in part driven by differences in aetiological factors, most observational studies of CRC risk factors lack the ability to examine risk factors for CRC by specific anatomic sites.

The lack of site-specific analyses to date may have hindered our ability to understand whether some risk factors may be particularly important (or unimportant) for cancer development, and even possibly precluded our ability to identify risk factor associations specific to an anatomic site. More specificity could guide studies of pathogenesis and have implications for cancer control strategies.

In addition to issues with anatomic site-specific case definition, prior case–control studies of CRC risk factors often lack controls with normal colonoscopy documenting absence of polyps or CRC. Because colorectal neoplasia is common, inclusion of controls with undiagnosed neoplasia (such as polyps) in case–control studies of CRC can reduce the ability to accurately estimate risk factor associations, and potentially lead to failures to identify true associations.

To address these gaps in the current CRC literature, our aim was to conduct an anatomic site-specific case–control study of candidate risk factors for CRC with normal colonoscopy controls using large-scale national data from the US Veterans Health Administration (VHA).


Study design, setting and data sources

We conducted a retrospective case–control study to explore the anatomic site-specific risk factors for CRC among US veterans receiving colonoscopy at the VHA. The Department of Veteran Affairs (VA) is one of the largest integrated healthcare providers in the USA, caring for over 6 million veterans annually.9 Since 1999, all VA sites have used an integrated electronic health record (EHR) for documentation of clinical encounters, which, along with additional healthcare resources, can be accessed for research.

The VA Corporate Data Warehouse provides access to discrete data, including demographic characteristics, administrative claims-based diagnosis and procedure codes, prescriptions, and anthropometric measures (eg, weight and height), as well as free-text data, including procedure notes and pathology reports. CRC was ascertained by the VA Central Cancer Registry (VACCR), which has been shown to accurately identify 90% of CRC cases.10 Follow-up was ascertained by the VHA Vital Status File including the date of last visit, represented as the date and time the last vital record was taken by the healthcare provider.11

Study sample and selection criteria

Our study sample consisted of veterans with at least one Current Procedural Terminology (CPT) code for colonoscopy from 1999 to 2011 (see online supplementary appendix table A for codes used). We excluded veterans with a history of CRC based on VACCR entry or an International Classification of Diseases, Ninth Revision (ICD-9) diagnosis code issued ≥6 months prior to baseline colonoscopy, as well as those with an ICD-9 code consistent with inflammatory bowel disease prior to and up to 6 months after baseline colonoscopy. We also excluded individuals with less than 3 years of follow-up. Full inclusion and exclusion criteria are noted in online supplementary appendix table B.

Case selection

Cases were identified by the VACCR and defined using International Classification of Diseases, Oncology, Third Revision (ICD-O-3) site codes for CRC (C18.0, C18.2–C18.7, C19.9, C20.9). For cases identified within (before or after) 6 months from date of baseline (first) colonoscopy, Surveillance, Epidemiology, and End Results (SEER) programme summary stage and histology were extracted. Online supplementary appendix table C includes details of our selection criteria. We excluded cases with unknown SEER stage, carcinoma in situ, or ICD-O-3 histology codes not consistent with adenocarcinoma. If tumour histology was not specified/available, we allowed for case inclusion as long as site, stage, and diagnosis date information was available, given that the majority of CRCs are adenocarcinomas. Cases were stratified into sites based on site codes as proximal (C18.0, C18.2–C18.4), distal (C18.5–C18.7) and rectal (C19.9, C20.9).

Control selection

Controls were veterans with no prior CRC diagnosis, normal baseline colonoscopy defined by presence of a CPT code for diagnostic colonoscopy only (45378 or G0121), and absence of a colon biopsy (as evidenced by absence of a pathology report within 30 days of baseline colonoscopy) (see online supplementary appendix table C for full outline of study selection criteria). Our prior work has shown that this approach is 96.3% sensitive and 97.5% specific for normal colonoscopy and had a positive predictive value of 97%.12 13 Additionally, to avoid inclusion of controls with missed CRC at baseline colonoscopy, controls with CRC diagnosed by VACCR or an ICD-9 code within up to 3 years of baseline colonoscopy were excluded. If a candidate control had less than 3 years of follow-up (due to death or lost to follow-up at VA), they were excluded to ensure that controls were CRC free.

Candidate risk factors

Candidate risk factors were ascertained based on presence at time of baseline colonoscopy, and included age, sex, race/ethnicity, body mass index (BMI), height, diabetes, smoking status, and aspirin exposure. BMI was characterised using previously developed criteria—using a median weight derived from 3 years of weight measurements, and a single height measure—that included removal of biologically implausible values.14 Diabetes was defined using a previously validated algorithm that included inpatient visits, outpatient visits and medications.15 Smoking status was classified into current, former, never and unknown.10 Aspirin exposure was defined as at least two prescriptions or two mentions of aspirin in free-text notes up to 1 year prior to colonoscopy. We have shown this approach has a positive predictive value and a negative predictive value of 99.2% and 97.5%, respectively, for capturing EHR-documented aspirin use.16

Statistical analyses

The primary outcome was CRC, stratified by anatomic site as proximal, distal, and rectal cancers. Risk factors were summarised by descriptive statistics and compared between sites using univariate tests (Kruskal-Wallis test and χ2 test). We used multinomial logistic regression to examine the risk factors for CRC at three anatomic subsites. The multinomial logistic regression can be expressed as log(pi/p0)=b0 +b1x1+…+bpxp, where pi is the probability that the subject is in the ith group, p0 is the probability that the subject is in the reference group and x1 is the risk factor. In the initial analysis, the group ‘normal control’ was specified as the reference group. To control for time trends in the performance of colonoscopy, and distribution of risk factors, we considered calendar year of procedure a priori as a potential confounder in our analyses. All risk factors were included simultaneously in one model thus effect estimates are interpreted as associations independent of other risk factors. Anatomic site-specific ORs with 95% CIs for each candidate risk factor were estimated, using adjusted models. For simplicity, we interpret our outcomes of ORs in the context of ‘risk’. Furthermore, we define increased risk as factors associated with OR >1, decreased risk as factors associated with OR <1, and 95% CIs not crossing unity as indicating statistical significance. If unadjusted and adjusted analyses show similar results, we presented the adjusted findings only in our results.

In order to assess whether CRC risks across anatomic subsites were statistically different, we also ran the multinomial logistic regression with ‘proximal’ and ‘distal’ specified as the reference category to allow for case-case comparisons of proximal versus distal, proximal versus rectal, and distal versus rectal risk. For case-case comparisons, p<0.05 was interpreted as statistically significant.

To consider the impact of potential immortal time bias resulting from our requirement that controls be cancer free at baseline colonoscopy through 3 years after colonoscopy follow-up, we performed a sensitivity analysis where we compared cases to controls who were cancer free within 6 months rather than 3 years. Because of observation of lower odds of CRC with increasing BMI, post hoc analyses assessed trends in BMI at 5 years and 10 years prior to index colonoscopy.

Analyses were performed using R V.3.5.1 and Stata V.15 (StataCorp, College Station, TX).17


From a study base of 1 878 429 veterans with colonoscopy during 1999–2011, we identified 21 744 CRC cases (n=7017 rectal; n=7039 distal; n=7688 proximal) and 612 646 controls which were CRC free at colonoscopy.

Table 1 shows the demographic characteristics of all cases and controls. For all CRC cases combined versus controls, median age was 68 years vs 61 years, 98% vs 95% were male, median BMI was 27.9 kg/m2 vs 28.9 kg/m2, 28% vs 24% had diabetes, and 25% vs 30% were never smokers; race/ethnicity groups were similar. Proximal cases were older than distal or rectal cases, had higher rates of diabetes and aspirin exposure, and were more likely to be non-Hispanic blacks. Distal cases had the highest BMI (28.4 kg/m2) compared with proximal (27.9 kg/m2) and rectal (27.3 kg/m2) cases.

Table 1

Demographic characteristics of CRC cases versus normal colonoscopy controls

Figure 1 shows the anatomic site-specific risk factor associations for CRC cases compared with normal colonoscopy controls by anatomic site, adjusted for other candidate factors. Online supplementary appendix figure 1 depicts univariate associations between candidate factors and site-specific CRC odds.

Figure 1

Risk factors for colorectal cancer by anatomic site. OR findings from adjusted multinomial logistic regression with corresponding 95% CIs, stratified by anatomic site (proximal, distal, rectal). BMI, body mass index; CRC, colorectal cancer.

Demographic factors

Age was associated with increased odds of CRC in all three subsites in both unadjusted and adjusted analyses. Five-year increase in age was associated with a significantly higher odds for proximal cancer (OR=1.58, 95% CI 1.56 to 1.59) compared with distal (OR=1.34, 95% CI 1.33 to 1.36) and rectal cancer odds (OR=1.29, 95% CI 1.28 to 1.30; p<0.05 for all case-case comparisons).

CRC odds were increased for males compared with females across all sites in unadjusted analyses. However, in adjusted analyses, CRC odds were increased for males compared with females for only distal cancer (OR=1.84, 95% CI 1.50 to 2.24) and rectal cancer (OR=2.84, 95% CI 2.25 to 3.58); odds were significantly higher for the comparison between males versus females for rectal compared with distal cancer based on case-case analyses (p<0.05 for case-case comparisons).

In unadjusted analyses, compared with non-Hispanic whites, blacks had no increased odds of distal cancer, increased odds of proximal cancer (OR=1.24, 95% CI 1.17 to 1.31) and decreased odds of rectal cancer (OR=0.77, 95% CI 0.72 to 0.83). In adjusted analyses, blacks had reduced odds of rectal cancer (OR=0.88, 95% CI 0.82 to 0.95), but increased odds of proximal cancer (OR=1.62, 95% CI 1.52 to 1.72) and distal cancer (OR=1.27, 95% CI 1.19 to 1.37); the ORs for the three subsite cancers are significantly different (p<0.05 for all case-case comparisons).

In unadjusted analyses, Hispanics had increased odds of distal cancer (OR=1.28, 95% CI 1.15 to 1.43) compared with non-Hispanic whites. In adjusted analyses, Hispanics had increased odds of cancer for all three sites (OR=1.57, 95% CI 1.40 to 1.77 for distal; OR=1.36, 95% CI 1.19 to 1.54 for proximal; OR=1.32, 95% CI 1.17 to 1.50 for rectal). Among Hispanics, distal cancer odds were significantly higher than rectal or proximal cancer odds (p<0.05 for case-case comparisons).

In unadjusted analyses, individuals classified as obese (defined by BMI ≥30.0 kg/m2) had lower odds of cancer at all three sites: (OR=0.62, 95% CI 0.58 to 0.66 for proximal; OR=0.79, 95% CI 0.74 to 0.85 for distal; OR=0.47, 95% CI 0.44 to 0.51 for rectal). In adjusted analyses, obese individuals had significantly reduced odds of rectal cancer (OR=0.59, 95% CI 0.55 to 0.64) significantly lower than that of proximal cancer (OR=0.88, 95% CI 0.82 to 0.94), compared with normal BMI individuals (<25 kg/m2; p<0.05).

In both unadjusted and adjusted analyses, increased height was weakly associated with increased odds of CRC. In adjusted analysis, increased height was associated with increased odds of proximal cancer (OR=1.03, 95% CI 1.02 to 1.04), distal cancer (OR=1.02, 95% CI 1.01 to 1.03) and rectal cancer (OR=1.01, 95% CI 1.00 to 1.02).

Clinical factors

Having diabetes increased odds for proximal cancer (OR=1.40, 95% CI 1.33 to 1.47) and distal cancer (OR=1.25, 95% CI 1.18 to 1.31) in unadjusted analyses. In adjusted analyses, having diabetes increased odds for proximal cancer (OR=1.29, 95% CI 1.22 to 1.36), significantly increased more than odds for distal cancer (OR=1.15, 95% CI 1.08 to 1.22) and rectal cancer (OR=1.12, 95% CI 1.06 to 1.19) based on case-case analyses (p<0.05 for case-case comparisons).

Current smoking was associated with increased odds of distal cancer (OR=1.10, 95% CI 1.02 to 1.18) and rectal cancer (OR=1.51, 95% CI 1.41 to 1.62) compared with never smokers in unadjusted analyses. The odds of CRC were significantly increased in all three sites for current smokers compared with non-smokers in adjusted analyses. Significantly higher odds of rectal cancer were shown for current smoking (OR=1.81, 95% CI 1.68 to 1.95) than proximal cancer (OR=1.53, 95% CI 1.43 to 1.65) and distal cancer (OR=1.46, 95% CI 1.35 to 1.57); p<0.05 for case-case comparisons.

Aspirin exposure was associated with increased odds of proximal cancer (OR=1.23, 95% CI 1.17 to 1.29) and distal cancer (OR=1.06, 95% CI 1.01 to 1.11) but decreased odds of rectal cancer (OR=0.83, 95% CI 0.79 to 0.87) in unadjusted analyses. In adjusted analyses, however, aspirin exposure was associated with reduced odds for cancer at all three subsites but most strongly associated with odds for rectal cancer (OR=0.71, 95% CI 0.67 to 0.76) compared with distal cancer (OR=0.85, 95% CI 0.81 to 0.90) and proximal site cancer (OR=0.91, 95% CI 0.86 to 0.95); two-sided p values for all case-case comparisons were less than 0.05.

Sensitivity and post hoc analyses

We performed post hoc analyses to further investigate our findings for BMI, looking at change in BMI from 10 and 5 years prior to baseline measurement (online appendix figure 2). Our findings indicated that from 5 years prior to baseline, mean BMI declined among proximal, distal and rectal cases, but increased among controls. As such, the lower BMI seen in cases compared with controls may in part be explained by weight loss developing as a result of occult cancer in the period prior to colonoscopic diagnosis.

Our sensitivity analyses to explore the impact of potential immortal time bias introduced by requiring controls to be CRC free for 3 years showed that relaxing criteria to being CRC free for 6 months resulted in qualitatively similar results (data not shown).


Among the 21 744 CRC cases and 612 646 normal colonoscopy controls, we found that candidate risk factor associations for CRC vary markedly by anatomic site. Significant differences in presence and magnitude of site-specific associations were found for a number of traditionally cited risk factors for CRC, including male sex, age, race/ethnicity, BMI, height, diabetes, and smoking. As such, accounting for anatomic site in epidemiological studies of CRC risk may allow for more accurate insights into CRC pathogenesis and strategies for cancer control. Our findings extend and clarify prior research on traditional risk factors for CRC.


Consistent with prior work, we found that a 5-year age increase was associated with 1.54-fold increased odds for proximal cancer, highlighting a significant distal to proximal colon cancer shift. This distal to proximal shift has been shown in prior studies, particularly among adults over age 70.8 18–22 Multiple studies have shown these age-related findings in proximal cancer were particularly strong among women.18 20 21 Iida et al reported that increased risk due to age could be related to the change in production or composition of bile acid, which is found to be associated with colorectal carcinogenesis in the proximal colon. In postmenopausal women specifically, Iida et al postulated that decreased oestrogen secretion may lead to increased secondary bile acid production and subsequent increased CRC risk.18 23


Compared with females, males had increased risk of CRC across all sites, which is in agreement with findings from prior studies.19 24 25 Notably, CRC risk in males increased with more than two times the odds of distal cancer, and nearly three times the odds of rectal cancer compared with females, a finding also supported by prior research.26 One potential explanation for this finding, independent of other candidate risk factors in our model, could be due to unmeasured sex-specific lifestyle factors. Poynter et al suggested that alcohol consumption could increase rectal cancer risk among males, but not among females.27 Among females, previous studies suggested the protective nature of hormone replacement therapy or oral contraceptive use, with Gao et al noting that the protective effects of hormones could explain the increased difference in rectal cancer risk between males and females.24 28 29


Our study found that risk of proximal colon cancer was increased among non-Hispanic blacks, which aligned with findings from previous studies.2 30 31 Irby et al hypothesised that the higher black-to-white risk ratios for proximal colon cancer could be attributed to a higher baseline diabetes prevalence among blacks as compared with non-Hispanic whites.30 However, when adjusting for diabetes within our analyses, the higher risk for proximal cancer observed among blacks persisted. Some have hypothesised that blacks may be more likely to have a higher proportion of cancer with microsatellite instability, which is more likely to arise in the proximal than distal colon, but consistency of this observation has not been borne out by other studies.32 33 As such, reasons for higher rates of proximal CRC among blacks in this study and other studies require further study.

Our study also found that Hispanics had increased risk for distal colon cancer compared with non-Hispanic whites, which aligned with two prior studies.34 35 Chattar-Cora et al found higher rates of distal colon among Hispanics in their study, noting that their findings might be due to more than half of their sample population being under age 65, when distal colon cancer is more likely.35 While Jafri et al had similar findings, they cautioned that there are likely variations in effect within Hispanic subgroups. These potential differences highlight a need for more research about CRC risk, particularly distal colon cancer risk, within these subpopulations.34

Body mass index

Our findings of a significantly protective effect of higher BMI on CRC risk at all subsites except distal were surprising, given that CRC is identified as an obesity-related cancer, and prior studies consistently show obesity to be associated with increased CRC risk, regardless of site.36–38 However, in our data we observed higher BMI was associated with reduced risk for proximal and distal cancers. We speculate several potential reasons for our discordant observations. The median BMI of all included individuals in our study was consistent with overweight (28.9 for controls and 27.9 for cases), which is higher than most studies of obesity and CRC risk, and could have impacted risk associations. BMI was recorded based on measurements just prior to time of colonoscopy. CRC can cause weight loss, which could affect the temporality of our estimation and make a higher BMI seem protective. In post hoc analyses, we found that BMI decreased markedly among cases within the 5 years prior to baseline, which would align with this hypothesis.

Another potential explanation for this discrepancy is the use of BMI as a surrogate measure for overall body fat. In our study, we calculated BMI using a median weight derived from 3 years of weight measurements, and a single height measure. A previous meta-analysis found that abdominal obesity is a more sensitive indicator of CRC risk than BMI, and that visceral obesity might be the main driving factor of the association between obesity and CRC risk.36 39 In this study, we were unable to differentiate between abdominal and visceral obesity, though future studies should consider this distinction to better understand the association between obesity and CRC risk. Additionally, we ascertained presence/absence of obesity over a short time frame (3 years prior to index colonoscopy); duration of obesity may be a better measure of obesity-related risk, and lack of association may have been due to inability to study persistent obesity.


Increased height was associated with a slight increased risk of all CRC types, though the effect was small. These findings align with those from prior systematic reviews showing that height is a risk factor for CRC, notably for proximal and distal cancers.40 41 A recent multinational cohort study additionally found that increased height was associated with increased proximal and distal cancer risk, but not rectal cancer risk.42 Potential mechanisms explaining this relationship include increased exposure to growth factors, such as growth hormone or insulin-like growth factors in childhood or early adulthood and excess calorie consumption in early life.40 43 44


Diabetes prevalence was associated with increased risk of all CRC types, but significantly higher risk of proximal colon cancer, which was consistent with prior research.45 46 An underlying mechanism that could explain higher risk of proximal colon cancer is the effect of hyperinsulinaemia on the colon.45 Insulin has mitogenic effects on CRC tissue, and upregulates leptin expression, which has been shown to increase cell proliferation within only the proximal colon.45 47 48 However, there have been few studies examining the effect of serum or plasma insulin levels on CRC risk to support this theory.47 49 50 More research is needed to understand the potential mechanisms by which diabetes may increase CRC (particularly proximal cancer) risk.


Current smoking was associated with increased CRC odds across all three sites, particularly proximal and rectal cancer risk, while former smoking was associated with increased risk of rectal cancer, all findings which align with the current literature.51 52 Botteri et al indicated that smoking is associated with CRC cases displaying high microsatellite instability, which tend to arise from the serrated pathway of CRC.51 The authors believed this might explain why higher smoking-related cancer risk exists in the proximal colon and rectum.51 Another explanation postulated by Leufkens et al was that smoking might be a risk factor for flat CRC adenomas, which are more commonly found in the proximal colon.53 54


Within our study, aspirin exposure was found to be significantly protective against all sites with the strongest protection against rectal cancer compared with other sites. Previous studies showed aspirin use to be protective against CRC risk, regardless of site.55 56 While it is likely that the benefits of aspirin use outweigh the potential risks, more research needs to be conducted to better understand all potential mechanisms that cause aspirin to have a protective effect, particularly for rectal cancer.


We comprehensively examined the association of seven major CRC risk factors with site-specific CRC odds. To date, most previous studies lacked adequate sample size to stratify findings by site. Given the molecular, clinical and pathological differences in CRCs arising from each site, stratifying our findings by site can enable researchers to dig deeper into site-specific mechanisms that could contribute to CRC tumorigenesis or policies that could promote more adequate prevention of certain types of CRC.

Our findings suggest that differences in CRC subsite risk exist by race and ethnicity, even within an equal access public healthcare system. Given that the VHA works to minimise financial barriers and provide quality care to all veterans, findings of racial and ethnic differences indicate that further studies are needed to learn more about factors that could predispose different racial or ethnic groups to site-specific CRC risk. Differences in risk by race/ethnicity despite an equal access system may point to differences in unmeasured biological factors or environmental exposures that may modify risk.

Increased risk of CRC regardless of site among current smokers indicates that more targeted screening efforts could help prevent CRC in these higher risk individuals. Our observation that smoking is associated with CRC risk at all subsites, but appears most closely associated with rectal and proximal cancer risk, suggests that the mechanisms driving risk may differ by anatomic subsites, and suggests a need for further study of the site-specific drivers of CRC risk.

Strengths and limitations

Several limitations may be considered when interpreting this work. The study population is composed of veterans receiving care within the VHA. As such, the findings may not be representative of the general US population. The sample was predominantly male, reflective of older US veterans. However, we had 380 female cases and 32 277 female controls, so while women were disproportionately represented compared with males, there was still a large absolute number of females included in the study. Data for additional candidate risk factors, such as physical activity, diet, and alcohol use, were not available within the EHR used for this study, precluding exploration of the association of these factors with site-specific CRC risk. Duration/dose of risk factor exposures, particularly for aspirin and diabetes, was not extensively measured, limiting the ability to explore potential causality in detail. We also did not consider combined effects of risk factors in this analysis by testing interaction, which would serve as another important future direction as we think about the joint effect of risk factors, such as smoking status and obesity or smoking status and aspirin exposure. Risk factor data were ascertained within 1 year prior to baseline (index) colonoscopy, which can lead to concerns about bias due to left truncation. Our decision to restrict risk factor collection to within 1 year of baseline was intended to address how EHR data are not measured at the same time, while also ensuring a small enough time window that would not lead to potential concerns about misrepresenting risk factor status at baseline. Thus, we anticipate the potential bias from left truncation to be minimal.

Our study also has several strengths. This study is one of the largest case–control studies to date to measure the association of key risk factors to anatomic site-specific risk for CRC. Cases were ascertained from the VACCR, which uses a rigorous process to collect information on cancer cases locally, and then validate them centrally. Furthermore, the use of normal colonoscopy controls without CRC or adenomas at baseline ensures greater comparability between cases and controls than previous studies could provide.


Our study findings show that the presence and strength of association of CRC risk factors may differ by anatomic site. Based on our observations, we suggest future studies should focus on better understanding mechanisms for some of these associations, such as that of diabetes and proximal cancer risk, former smoking and rectal cancer risk, and aspirin exposure on site-specific CRC risk. Ultimately, accounting for anatomic site in epidemiological studies of CRC may enable better insights into CRC pathogenesis and potential cancer control strategies. Accordingly, anatomic site of CRC should be a key consideration in future studies of CRC risk.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.


  • LL and SG are co-senior authors.

  • JD and AE are co-first authors.

  • Contributors Concept and design: JD, AE, RB, LL, AKB, JM, MEM, SG. Analysis and interpretation of data: JD, AE, RB, LL, SG. Drafting of manuscript: JD, AE, RB, LL, AKB, JM, MEM, SG. Critical revision of the manuscript for important intellectual content: JD, AE, RB, LL, AKB, JM, MEM, SG. Statistical analysis: JD, RB, LL, SG. Obtained funding: SG.

  • Funding This research was supported by the VA Health Services Research and Development (Grant No 5I01HX001574-04, PI: SG) and the National Cancer Institute/National Institutes of Health (Grant No 1R37CA222866-01, PI: SG; Grant No 1F32CA239360-01, PI: JD).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.