Reduced keratin expression in colorectal neoplasia and associated fields is reversible by diet and resection

Background Patients with adenomatous colonic polyps are at increased risk of developing further polyps suggesting field-wide alterations in cancer predisposition. The current study aimed to identify molecular alterations in the normal mucosa in the proximity of adenomatous polyps and to assess the modulating effect of butyrate, a chemopreventive compound produced by fermentation of dietary residues. Methods A cross-sectional study was undertaken in patients with adenomatous polyps: biopsy samples were taken from the adenoma, and from macroscopically normal mucosa on the contralateral wall to the adenoma and from the mid-sigmoid colon. In normal subjects biopsies were taken from the mid-sigmoid colon. Biopsies were frozen for proteomic analysis or formalin-fixed for immunohistochemistry. Proteomic analysis was undertaken using iTRAQ workflows followed by bioinformatics analyses. A second dietary fibre intervention study arm used the same endpoints and sampling strategy at the beginning and end of a high-fibre intervention. Results Key findings were that keratins 8, 18 and 19 were reduced in expression level with progressive proximity to the lesion. Lesional tissue exhibited multiple K8 immunoreactive bands and overall reduced levels of keratin. Biopsies from normal subjects with low faecal butyrate also showed depressed keratin expression. Resection of the lesion and elevation of dietary fibre intake both appeared to restore keratin expression level. Conclusion Changes in keratin expression associate with progression towards neoplasia, but remain modifiable risk factors. Dietary strategies may improve secondary chemoprevention. Trial registration number ISRCTN90852168.


LC-MS/MS analysis
Fractions collected from offline separation techniques were eluted through the Famos-Ultimate 3000 nano-LC system (Dionex, LC Packings, The Netherlands) interfaced with a QSTAR XL (Applied Biosystems; MDS-Sciex) tandem ESI-QUAD-TOF MS. Vacuum dried fractions were resuspended in loading buffer (3% acetonitrile, 0.1% trifluoroacetic acid), injected and captured into a 0.3×5 mm trap column (3 μm C18 Dionex-LC Packings). Trapped samples were then eluted onto a 0.075×150 mm analytical column (3 μm C18 Dionex-LC Packings) using an automated binary gradient with a flow of 300 nL/min from 95% buffer A (3% acetonitrile, 0.1% formic acid), to 35% buffer B (97% acetonitrile, 0.1% formic acid) over 90 min, followed by a 5 min ramp to 95% buffer II (with isocratic washing for 10 min). Predefined 1 s 350−1600 m/z MS survey scans were acquired with up to two dynamically excluded precursors selected for a 3 s MS/MS (m/z 65−2000) scan. The collision energy range was increased by 20% as compared to the unlabeled peptides in order to overcome the stabilizing effect of the basic N-terminal derivatives, and to achieve equivalent fragmentation as recommended by Applied Biosystems.

Protein identification and relative quantification
The mass-spectrometric data was collected and analysed as previously described 2 , 3 , Briefly, MS/MS data generated from the QSTAR ® XL was first converted to generic MGF peaklists using the mascot.dll embedded script (version 1.6 release no. 25) in Analyst QS v. 1.1 (Applied Biosystems, Sciex; Matrix Science). Further processing of the data was undertaken using an in-house Phenyx algorithm cluster (binary version 2.6; Geneva Bioinformatics SA) at the ChELSI Institute, University of Sheffield, against the Homo sapiens UniProt protein knowledgebase (SwissProt and Trembl (41070 and 71449 entries respectively, downloaded 5th November 2010,) to derive peptide sequence and hence protein identification. These data were then searched within the reversed Homo sapiens database to estimate the false-positive rate 4 . Peptides identifications at 1% false discovery rate were accepted. The iTRAQ reporter ion intensities were exported. Protein quantifications were obtained by computing the geometric means of the reporters' intensities. Median correction was subsequently applied to every reporter in order to compensate for systematic errors, e.g. if a sample happened to 2 Pham, T.K., et al., A quantitative proteomic analysis of biofilm adaptation by the periodontal pathogen Tannerella forsythia. Proteomics, 2010. 10(17): p. 3130-41. 3 Majumdar, D., Rosser, R., Havard, S., Lobo, A. J., Wright, P. C., Evans, C. A., Corfe, B. M. (2012, July). An integrated workflow for extraction and solubilization of intermediate filaments from colorectal biopsies for proteomic analysis.. Electrophoresis, 33(13), 1967Electrophoresis, 33(13), -1974 have been loaded at a largely different total concentration. The reporters' intensities, in each individual MS/MS scan, were also median corrected using the same factors, with the rationale that if the total concentration of a sample A was half that of another sample B, the intensities of sample A's reporter have to be doubled to allow for a fair comparison. t-tests applied to determine alterations in protein level between samples use these corrected intensities since these were was carried out for every protein and because of the multiple times each test was performed, the threshold (α=5%) used for significance was corrected for data mining. Here, we used the standard Bonferroni correction (α/P, where P is the number of proteins) to minimise false positive results. This workflow was developed in house 2 .

Calculation of pI and MW
The physicochemical properties of peptides identified and relatively quantified in the analysis were plotted using the Innovagen peptide property calculator.

Supplementary information 3: Detailed 2d gel method
Protein lysates were prepared from frozen patient mononuclear cell material, which was resuspended in 350 μl of isoelectric focusing buffer (9 m urea, 2 m thiourea, 4% (w/v) CHAPS, 65 mm dithiothreitol, 0.5% (v/v) IPG buffer (Amersham Biosciences)) per 1 × 106 cells. Isoelectric focusing was performed using 18-cm immobilized pH gradient strips (pH 3-10 NL) and the Multiphor II instrument (Amersham Biosciences), focusing for a total of 49 kV h at 20 °C. The second dimension was a standard SDS-PAGE protocol using the ISODALT system (Amersham Biosciences). Strips were equilibrated for 10 min in equilibration buffer (50 mm Tris-HCl, pH 6.8, 6 m urea, 30% (v/v) glycerol, 2% (v/v) SDS) containing 65 mm dithiothreitol and then for 10 min in the same buffer containing 240 mm iodoacetamide. Second-dimension gels were 12% SDS-PAGE gels of 160 × 180 × 0.75 mm. Gels were stained with silver using the protocol of Shevchenko et al 5 .

Supplementary Information 6: Protein interaction network for the IF Dataset
Protein identifiers for the IF fraction were submitted to STRINGS database. Orphan nodes were excluded from the representation. All remaining proteins formed a potential single network, centred on the clusters around keratin 8, 18 and vimentin and the collagens. Whilst caution is needed as some edges are based solely on text mining, the data suggest a surprising cohesivesness.

Supplementary Information 7: Analysis of pathways representation for IF Dataset
The entire sampleset listed in the table in section 5 was assessed using the pathways analysis tool in Reactome (www.reactome.org) on 18 th February 2012. The Pathways identified as enriched are colour-coded as a heat-map and assigned a P-value following hypergeometric analysis. The data suggest that NCAM signalling for neurite outgrowth is highly over-represented (P>1.2 x10 -10 ). Other pathways over represented include collagen formation and modification. As the dataset are skewed for cytoskeletal and insoluble proteins, this is confirmatory that the fractionation is enriching correctly.
A tabulated version of these data is shown below.

Characterisation of proteins in IP unbound fraction
The abundance and variation in proteins present in the unbound IP fraction was determined by SDS-PAGE and silver-staining (Panel A). The figure shows large numbers of protein species in the sample, with a representative cross-section of molecular weights. These data demonstrate that the sample remains in good condition (little degradation) and is not skewed to species of any particular mass range. Samples were tryptic digested, iTRAQ-labelled and separated by SCX. The distribution of peptides/proteins across the SCX fractionation is shown in Panel B. In total 183 proteins were identified using this approach.
Panel C shows the pooling strategy. Subjects were stratified by diagnosis (Nor-normal, Ade-adenoma) and by the concentration of butyrate in faecal samples. Four subjects' worth of biopsies were required for a successful IP (data not shown). Mean butyrate level of subjects' faecal SCFA was determined for each pool, pools are signified by the diagnosis and mean butyrate, shown on the x-axis. There was no significant difference in the mean butyrate levels stools of the highest and lowest pools (T-test). Protein identifiers covering the list of all identified proteins were entered into Reactome Instance Browser (accessed 27.02.13). Pathways and subpathways with a P-value <10-4 are listed, along with information on proportion of pathway identified.