Introduction
In 2021, an estimated 76,080 adults in the United States will be diagnosed with kidney cancer, and an estimated 13,780 deaths from this disease will occur1. Renal cell carcinomas (RCCs), the most common family of kidney tumors and one of the top ten most common cancers in the US, are further stratified into several histologic subtypes. The most common subtype is clear cell RCC (ccRCC), accounting for ~75% of cases; papillary RCC (pRCC) and chromophobe RCC (chRCC) account for ~15% and 5% of cases, respectively2. These subtypes display divergent clinical behavior with regard to prognosis and response to therapeutic agents3,4,5. Large-scale molecular profiling efforts have characterized the genomic and transcriptomic landscapes of ccRCC6, pRCC7, and chRCC8. These analyses revealed remarkable heterogeneity among these forms of RCC, with each subtype exhibiting distinctive somatic mutations, chromosomal copy number alterations, and gene expression profiles3. Notably, the histone modifications and sites of chromatin accessibility driving the transcriptional landscapes of RCC histotypes, and how this relates to kidney cancer heritability, have not been systematically explored.
Enhancers are cis-acting DNA regions that bind trans-acting proteins to determine cell-type-specific gene expression patterns and responses to internal and external signals9,10. Chromatin immunoprecipitation and sequencing (ChIP-seq) of post-translational histone modifications (e.g., H3K27ac and H3K4me2) have identified millions of enhancers in mammalian genomes with the number of active enhancers in any given cell type estimated to be in the tens of thousands10,11. Profiling enhancers has emerged as a powerful tool to characterize critical transcription factors (TFs) driving cellular transcriptional states and to better understand germline risk variants. It has been established that cellular identity and function is determined primarily by a subset of TFs termed “master” TFs12,13,14,15,16. These TFs occupy active enhancers in the cell, and preferentially bind within exceptionally large enhancer domains termed “super-enhancers” (SEs) or stretch enhancers, that regulate genes required for establishing cell identity and function17,18. Moreover, master TFs participate in interconnected auto-regulatory circuitries or “cliques” that are self-reinforcing, show marked cell selectivity, and function to maintain cell state and/or cell survival17,19. In addition, analyses of population-based epigenomes have further revealed that expression quantitative trait loci (eQTLs) SNPs are often also associated with variation in nearby epigenomic features (such as active enhancers marked by H3K27ac) in coordinated regulatory modules20,21,22,23,24,25, motivating the use of epigenetic datasets for better functional characterization of these loci. Many of the prior epigenetic data in RCC were based on cell lines, which diverge from their original tumors and do not represent all histologic subtypes26.
Herein, we define the epigenetic architecture and circuitry of RCC across different histologic subtypes in 42 primary human RCC tumors using a combination of genome-wide H3K27ac, H3K4me2, and chromatin accessibility assays, and integrate data from targeted DNA sequencing and bulk RNA sequencing. We delineate distinctive enhancers operative in the different RCC histologies, nominate putative histology-specific master TFs, and prioritize RCC GWAS risk loci for functional validation.
Results
Mapping the chromatin regulatory landscape of RCC
To examine the cistrome and to identify master TFs across the histologic subtypes of RCC, we performed histone chromatin immunoprecipitation followed by sequencing (ChIP-seq), the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq), and RNA sequencing (RNA-seq) on 42 fresh frozen RCC tumor samples (24 ccRCC, 6 pRCC, 12 chRCC). Thirty-eight of 40 (95%) tissues were derived from radical nephrectomies. Of the 6 pRCC tumors, 4 were type I, 1 was type II, and 1 was unknown. Patients were grouped into two cohorts (1 and 2) as defined in Supplementary Data 1, Supplementary Fig. S1, and the Results section. We conducted H3K4me2 ChIP-seq to map both active and poised enhancers27 and H3K27ac ChIP-seq to identify active promoters and enhancers28. ATAC-seq was performed to define the chromatin accessibility landscape, and RNA-seq was performed to capture the transcriptional programs of each RCC subtype. A total of 153 libraries were generated across the different datatypes (Supplementary Data 1 and 2 and Supplementary Fig. S1). Using cohort 1 to assess the regulatory landscape across the different RCC histologies, a total of 153,321 promoter-distal (enhancer) H3K27ac ChIP-Seq peaks were identified across all the samples, most of which (n = 136,469) were common to two or more histologies (Fig. 1A, B). For example, the PAX8 locus is marked by H3K27ac in all samples. PAX8 is a TF involved in early kidney embryogenesis and oncogenesis in RCC29,30 and is a clinical diagnostic tool to help differentiate RCC from other malignancies31 (Fig. 1C). Unsupervised hierarchical clustering (Fig. 1D) and principal component analysis (PCA) (Supplementary Fig. S2A) of H3K27ac peaks clearly segregated the three histologic types of RCC, and both analyses demonstrated that the H3K27ac landscape in chRCC was more distinct than either pRCC or ccRCC. Moreover, unsupervised hierarchical clustering of our cohort with an independent cohort of matched normal (n = 10) from the KIRC TCGA cohort32 showed a clean separation of normal tissue from the different tumor histotypes (chRCC, ccRCC, pRCC, Supplementary Fig. S2B, S2C). In total, 16,852 peaks were significantly increased or decreased in one tumor histology compared to the other two (e.g., chromophobe versus clear cell and papillary) (“Methods”, false discovery rate (FDR) of 0.001 and least a fourfold difference in mean peak intensity between groups). In all, 12,908 sites were upregulated in one histology: 8939 were chRCC-specific, 3653 were pRCC-specific, and 316 were ccRCC-specific (Fig. 1E). These histology-specific H3K27ac peaks were differentially marked by H3K4me2 ChIP-Seq and were associated with open chromatin, strongly suggesting that they were histology-specific active enhancers (Supplementary Fig. S2D). Moreover, differential epigenetic sites correlated with the nearest gene expression difference (P value <2 × 10−16, chi-square test) (Fig. 1F–H). GREAT33 analysis of the 8939 chRCC-specific H3K27ac peaks revealed enrichment for genes involved in actin regulation, fatty acid oxidation, and ion transmembrane transporter activity (Fig. 2A), consistent with previously reported mRNA signature analysis showing an increased ion transmembrane transport signature in chRCC34. A similar analysis of the 3653 pRCC-specific sites showed enrichment for genes involved in renal system development (Fig. 2B). This is consistent with the notion that pRCC arises from embryonic nephrogenic rest precursor lesions, which persist during adult life35. Since there was a relatively small number of ccRCC-specific sites (n = 316) in comparison to chRCC and pRCC, and the enhancer landscape of ccRCC resembled that of pRCC more than chRCC, we compared H3K27ac sites between ccRCC and pRCC only. The majority of H3K27ac sites were common to the two histologies (n = 113,786), while 1265 sites were ccRCC-specific, and 2661 sites were pRCC-specific (Supplementary Figs. S2E and S3A). The 1265 ccRCC-enriched peaks were associated with genes involved in circulatory system development and angiogenesis (Fig. 2C), while the 2661 pRCC-enriched peak genes were again enriched for genes involved in kidney embryogenesis (Supplementary Fig. S3B). Similar analysis of the H3K4me2 peaks demonstrated clear separation of chRCC from the other two RCC subtypes with comparable histology-specific and common “poised” sites (Supplementary Fig. S3C–3F). H3K27ac and H3K4me2 ChIP-seq signals for all samples were strongly correlated (Pearson correlation, r = 0.73) (Fig. 2D).
A Distribution of RCC H3K27ac peaks according to genomic region for 30 fresh frozen RCC tumor samples (12 chRCC, 6 pRCC, 12 ccRCC). B Numbers of histology-specific and common H3K27ac peaks. C H3K27ac profiles at PAX8 in six representative samples from each RCC histology. D Hierarchical clustering of chRCC, ccRCC, and pRCC based on sample-to-sample pairwise correlation of the H3K27ac ChIP-seq peaks. E Distribution of histology-specific H3K27ac peaks among RCC subtypes. F–H Volcano plots with the log change of gene expression (FPKM) in one histology compared to the other two histologies (F ccRCC vs. others, G pRCC vs. others, H chRCC vs. others). Two-sided P values were used and corrected for multiple comparison testing (FDR-adjusted P value <0.05). RCC renal cell carcinoma, chRCC chromophobe RCC, ccRCC clear cell RCC, pRCC papillary RCC.
A GREAT analysis of chromophobe-enriched peaks (n = 8939). Two-sided P values are shown. B GREAT analysis of papillary-enriched peaks (n = 3653). Two-sided P values are shown. C GREAT analysis of clear cell-enriched peaks in clear cell vs. papillary only comparison (n = 1265). A–C GREAT calculates statistical enrichments for association between genomic regions and annotations. Two-sided P values are shown. D Density map of correlation between H3K27ac versus H3K4me2 ChIP-seq peaks across subtypes. E Four most significantly enriched nucleotide motifs present in chRCC-specific sites by de novo motif analysis, limited by ATAC peaks. F, G H3K27ac profiles near FOXI1 and TFCP2L1, respectively, in two representative samples of each histology (chRCC, ccRCC, pRCC). H Two most significantly enriched nucleotide motifs present inccRCC specific sites by de novo motif analysis. P values are two-sided and unadjusted for multiple comparisons. I, J H3K27ac profiles near ETS-1, and HNF1B, respectively, in two representative samples of RCC histology. K Three most significantly enriched nucleotide motifs present in pRCC-specific sites by de novo motif analysis. P values are two-sided and unadjusted for multiple comparisons. RCC renal cell carcinoma, chRCC chromophobe RCC, ccRCC clear cell RCC, pRCC papillary RCC.
De novo motif analysis of H3K27ac peaks enriched in each subtype identified four motifs that were highly enriched in chRCC, including one resembling a forkhead motif, and another resembling the motif associated with TFCP2 (Fig. 2E). FOXI1, a forkhead family TF, and TFCP2L1, closely related to TFCP2, have both been implicated in the development of intercalated cells of the kidney, the putative cell of origin of chRCC36. FOXI1 and TFCP2L1 gene loci were characterized by high H3K27ac signal in chRCC compared to the absent or markedly lower signal in ccRCC and pRCC (Fig. 2F, G). Motif enrichment analysis of putative ccRCC-specific enhancers identified a motif resembling the basic Leucine Zipper (bZIP) BATF motif, and another resembling the basic helix-loop-helix (bHLH) TF family member HIF2α (also known as EPAS1) (Fig. 2H). ETS1, in the bZIP family with BATF, has been implicated in von-Hippel Lindau (VHL)-dependent ccRCC tumorigenesis37, and was highly marked by H3K27ac in ccRCC, less so in pRCC, and not in chRCC (Fig. 2I). EPAS1 is well-known to be dysregulated in ccRCC due to the loss of the VHL protein function. It is the main driver of ccRCC and is upstream of multiple critical oncogenic pathways. Recent clinical trials with HIF2 inhibitors have shown clinical activity in advanced ccRCC patients38,39. The top-scoring pRCC-specific motif corresponded to HNF1β (Fig. 2J, K). HNF1β is a member of the homeodomain-containing superfamily of TFs, which is involved in nephrogenesis40, and was highly marked by H3K27ac specifically in pRCC (Fig. 2K).
RCC has recurrent mutations in genes involved in the chromatin remodeling and histone methylation pathways7,8,41,42,43,44. In our dataset, loss-of-function single-nucleotide variant and copy number alterations (see “Methods”) in such commonly mutated genes in RCC (VHL, PBRM1, BAP1, and TP53) did not correlate with global acetylation differences in both supervised and unsupervised analyses, albeit the small sizes were small. (Supplementary Fig. S4A–S4D and Supplementary Data 3 and 4).
Specific master transcription factors define RCC subtypes
We next sought to systematically identify candidate histology-specific master TFs that define the three subtypes of RCC. Master TFs typically bind within SEs17,45, are often regulated by SEs, and regulate one another in a transcriptional core regulatory circuit (CRC)46. We employed an integrative approach47,48, leveraging the RNA-Seq, ChIP-Seq, and ATAC-Seq data to identify candidate histology-specific master TFs (Fig. 3A). This approach aims to utilize orthogonal information to identify a consensus set of master TFs. We combined (1) expression data of differentially expressed TFs across RCC histologic subtypes; (2) TFs specific to RCC histologic subtypes relative to other cancer types (CaCTS)49; (3) differential SE-associated TFs among RCC histologic subtypes; and (4) TFs with histology-specific connectivity in regulatory cliques (see “Methods”). These four analyses identified more than 200 candidate TFs showing a histology-specific association in one or more analysis. We prioritized candidates for downstream validation by selecting those that were identified in more than one analysis (“Methods”, Supplementary Figs. S5A-S5J and S6 and Supplementary Data 5–11). This analysis highlighted 50 candidate histology-specific master TFs (N = 20, chRCC; N = 14, pRCC; N = 16, ccRCC) (Supplementary Data 12), including FOXI1, TFCP2L1, and DMRT2 for chRCC; EPAS1, ETS1, BARX2, ZNF395 for ccRCC, and HNF1B and NR2F2 for pRCC (Fig. 3B). SE ranking, gene expression, and CES of the 50 histology-specific TFs selected using this meta-analysis approach clustered the samples tightly according to their respective histologies (Supplementary Fig. S6A).
A Overview of the approach used to nominate histology-specific master TFs participating in CRC. B Heatmap integrating the 50 histology-specific TFs identified by the meta-analysis approach (CES, differential expression, SE rank analysis, and CaCTS). C Representative immunohistochemical stainings of indicated antibodies in samples from ccRCC, chRCC, and pRCC tumors. Four histology-specific master TFs are shown. The number of positive tumors/number of tumors examined are indicated below each image (for ccRCC, n = 4 for TF BHLHE41, n = 3 for TFs HNF1B, NKX6.1, and ZNF395. For chRCC, n = 3 for TFs BHLHE41, HNF1B, and ZNF395, n = 2 for NKX6.1. For pRCC, n = 3 for BHLHE41, NKX6.1, and ZNF395, n = 2 for HNF1B). Scale bar is 50 μm. D ChIP-seq binding profiles of EPAS1 across 6 ccRCC and 2 chRCC human tissue samples. E GREAT analysis of chRCC- enriched EPAS1 peaks relative to ccRCC. Two-sided P values are shown. F GREAT analysis of ccRCC-enriched EPAS1 peaks relative to chRCC. Two-sided P values are shown. E, F GREAT calculates statistical enrichments for the association between genomic regions and annotations. ChIP-seq chromatin immunoprecipitation sequencing, ATAC-seq assay for transposase-accessible chromatin sequencing, DEG differentially expressed genes, SE superenhancer, CaCTS cancer core transcription-factor specificity, CRC core regulatory circuitries, RCC renal cell carcinoma, chRCC chromophobe RCC, ccRCC clear cell RCC, pRCC papillary RCC. Source data are provided as a Source Data file.
To confirm whether similar patterns of TF-specificity are found in normal counterpart tissues, we turned to gene expression datasets from TCGA KICH, KIRC, and KIRP cohorts. GSEA analysis for histology-specific master TFs showed enrichment of clear cell and papillary RCC-specific transcription factors in tumors compared to normal (FDR corrected q-value <0.01). There was no significant enrichment of chromophobe RCC-specific transcription factors in tumors compared to normal. Of 50 master TFs, 34 (68%) had more significant expression in the tumor tissue compared to the normal counterpart (14/16 for KIRC, 10/14 for KIRP, and 10/20 for KICH, Supplementary Fig. S6B–S6D).
Clinical correlative analyses from CheckMate cohorts (009/10/025) showed that among the 50 master TFs, high BARX2 expression significantly correlated with improved overall survival in the entire cohort of patients with ccRCC (Supplementary Data 13 and Supplementary Fig. S6E) and in the subgroup treated with the anti-PD1, nivolumab (Supplementary Data 13 and Supplementary Fig. S6F).
To provide proof of concept validation at the protein level, we investigated the expression specificity and localization of master TFs by immunohistochemistry (IHC). We selected four representative master TFs that met the following criteria: (1) at least twofold change in gene expression of the histology-specific TF by bulk RNA sequencing (from TCGA) compared to the two other RCC histotypes, (2) high-quality antibodies for IHC, and (3) TF was not previously validated and implemented on a clinical level. For the 4 TFs (BHLHE41, HNF1β, NKX6.1, and ZNF395), we found significant changes in protein expression levels across histologic subtypes (Fig. 3C). More specifically, two ccRCC-specific master TFs (BHLHE41 and ZNF395), were highly expressed in ccRCC tumors. Both TFs had nuclear localization in four out of four and three out of three ccRCC samples, respectively. NKX6.1, a TF recently described as being expressed in chRCC, was detected nuclearly in two out of two chromophobe samples with no expression in either ccRCC or pRCC. We next confirmed histology-specific in vivo binding of the nominated master TF EPAS1 through examining the EPAS1 cistrome in ccRCC and chRCC. We characterized 2916 clear cell RCC-specific and 4564 chromophobe RCC- specific EPAS1-binding sites through performing EPAS1 ChIP on eight additional primary human tumors (n = 2 chRCC; n = 6 ccRCC, Fig. 3D). Subtype-specific EPAS1-binding sites were highly enriched for subtype-specific sites of H3K27ac. For instance, 2090 of the 4565 chromophobe-specific EPAS1-binding sites (46%) overlapped with chromophobe-specific H3K27ac sites (P < 2.2e-16). This supports the conclusion that these differential EPAS1-binding sites are non-random and biologically relevant, because they coincide with regulatory elements that segregate closely with histology. ccRCC-specific EPAS1-binding sites were enriched for immune cell and white blood cell activation pathways and chRCC-specific EPAS1-binding sites were enriched for metabolic processes and fatty acid activation (Fig. 3E, F).
To further validate our approach of the nominated master TFs in driving the transcriptional identity of the different RCC histologies, we manipulated TF expression in a ccRCC cell line. We hypothesized that overexpression of chRCC-specific TFs and suppression of ccRCC-specific TFs can shift the transcriptional landscape of the ccRCC cell line 786-O to become more chromophobe-like. FOXI1 scored as a chRCC-specific TF in 4/4 master TF analyses (Fig. 3B). Furthermore, FOXI1 is selectively expressed in intercalated cells (ICs), the putative cellular origin of chRCC36,50 and is more highly expressed in chRCC than other cancer subtypes in the TCGA dataset (Supplementary Fig. S7A). We also manipulated the expression of EPAS1 as a second target as it was highly specific for ccRCC in our integrative analysis (Fig. 3B and Supplementary Fig. S8B), and prior studies have characterized its role in the pathogenesis of ccRCC51. We overexpressed FOXI1 in the ccRCC cell line 786-O (FOXI1 OE); suppressed EPAS1 (EPAS1 KD) and simultaneously perturbed both genes in the same cell line (FOXI1 OE/EPAS1 KD) (Supplementary Fig. S7C–7E and Supplementary Data 14). Expression changes are described in Supplementary Fig. S8A–S8F and Supplementary Data 15–18. Gene set enrichment analysis (GSEA) of downregulated genes with FOXI1 OE/EPAS1 KD in 786-O showed enrichment of genes involved in the immune responses (Fig. 4A). Concomitant FOXI1 OE and EPAS1 KD in 786-O cells resulted in upregulation of chRCC-specific master TFs such as TFCP2L1 and NR3C2 and down-regulation of ccRCC-specific master TFs such as ETS-1 and ZNF395 compared to control cells. (Fig. 4B). Further supervised analysis showed that CAIX, a downstream target of EPAS1 was significantly overexpressed in the WT 786-O cell line compared to EPAS1 KO/FOXI1 OE cell line. In contrast, CD117, a well-known immunohistochemical marker of chRCC52, was overexpressed in the EPAS1 KO/FOXI1 OE cell line compared to WT 786-O cell line. Although CK7 expression is a supportive marker clinically for chRCC53, its expression was uniform across WT 786-O and manipulated cell line states (Fig. 4C). Of note, there were only 109 differentially expressed genes between the FOXI1 OE/EPAS1 KD cell and FOXI1 OE only cell line (P < 0.05, q < 0.01, Supplementary Fig. S8D and Supplementary Data 17).
A Downregulated GO biological terms in the cell line 786-O FOXI1 OE/EPAS1 KD. Two-sided adjusted P value corrected for multiple comparisons. B Rank order of differentially expressed TFs between 786-O CTRL and 786-O FOXI1 OE/EPAS1 KD by expression levels. Each TF dot represents one TF. Wald test from DESeq2 was used. Purple dots indicate adjusted P value <0.001 by DESeq Expression. C Gene expression levels of CAIX, CD117, and CK7 across different 786-O cell line conditions. D Heatmap showing the relative expression of the top 20 upregulated genes in ccRCC vs. chRCC and vice versa in the different cell line conditions. E GSEA analysis showing the top 100 upregulated genes from chRCC vs. ccRCC dataset in 786-O FOXI1 OE/EPAS1 KD vs. 786-O CTRL comparison. F GSEA analysis of the top 100 upregulated genes in ccRCC vs. chRCC in 786-O CTRL vs. 786-O FOXI1 OE/EPAS1 KD. CTRL control, KD knockdown, OE overexpression, TF transcription factor, GSEA gene set enrichment analysis, 786Oc1 CTRL1, 786Oc2 CTRL2, 786OD1 786-O FOXI1 OE/EPAS1 KD1, 786OD2 786-O FOXI1 OE/EPAS1 KD2, NES Normalized Enrichment Score, FDR false discovery rate, FC fold change, RCC renal cell carcinoma, chRCC chromophobe RCC, ccRCC clear cell RCC, pRCC papillary RCC.
We then compared differentially expressed genes between our experimental conditions in 786-O and our RCC tumor expression data (Fig. 4D and Supplementary Fig. S8G). Defining chRCC and ccRCC gene sets as the top 100 upregulated genes for each histology relative to the other, we showed that the chRCC gene set is enriched among upregulated genes in 786-O FOXI1 OE/EPAS1 KD cell line vs. control (Fig. 4E), and the ccRCC gene set is enriched among upregulated genes in 786-O control vs. FOXI1 OE/EPAS1 KD cell line (Fig. 4F). The FOXI1 OE/EPAS1 KD 786-O cell line demonstrated significantly higher expression of chRCC-specific candidate master TFs, TFCP2L1, GATA2, DDIT3, NKX6-1, and lower expression of ccRCC-specific candidate master TFs, ZNF395 and TSC22D3. Differential TFs in the 786-O FOXI1 OE/EPAS1 KD vs. 786O CTRL comparison overlapped more significantly with differential TFs from the chRCC vs. ccRCC human samples comparison than from the ccRCC vs. pRCC human sample comparison (Supplementary Data 19). In summary, these data show that overexpression of a single chRCC master transcription-factor candidate, FOXI1, in the ccRCC cell line 786-O, with or without knockdown of EPAS1, led to marked expression changes driving the cell line to be more like a chRCC cell line without any modification to the set of mutations present in 786-O cells.
Allelic imbalance annotates germline RCC risk variants
Allelic imbalance, the differential allelic representation of heterozygous single-nucleotide polymorphisms (SNPs) in ChIP-seq reads, provides an in vivo comparison of cis-regulatory activity between two haplotypes (Fig. 5A). As such, allelic imbalance can highlight the functional relevance of candidate causal variants22,32,54,55,56,57 at loci that have been associated with RCC risk through GWAS. A key advantage of profiling epigenomes from many individuals is the ability to capture heterozygous sites and to measure the effect of TF binding on regulatory elements through analysis of allelic imbalance. To this end, we assessed chromatin allelic imbalance in ChIP-seq reads from these RCC samples in order to nominate causal risk SNPs from a genome-wide association study of RCC. We applied stratAS32 to H3K27ac ChIP-seq data from 20 ccRCC and 6 pRCC (“Methods”). We identified 10,605 chromatin-imbalanced H3K27ac peaks—which we defined as peaks with one or more imbalanced SNPs after correction for multiple hypothesis testing—across 183,099 peaks tested in the combined ccRCC and pRCC sample sets (Fig. 5B and Supplementary Data 20).
A Schematic of allelic imbalance at heterozygous SNPs. B Schematic showing subset of allelically imbalanced H3K27ac peaks from total H3K27ac peaks in RCC (pRCC and ccRCC) samples. C AI at rs4903064 (chr14:73279420; DPF3) in RCC. D rs4903064 C-allele sequence context creates a HIF-2a binding site. E RPKM values for the EPAS1 ChIP signal in the peaks around rs4903064 in 43 samples. Genotype is shown on the X axis. Box boundaries correspond to 1st and 3rd quartiles; whiskers extend to a maximum of 1.5× the interquartile range. Two-sided unadjusted p-values are shown. Statistical analysis was performed using Kruskal–Wallis test. F Overlayed EPAS1 and H3K27ac ChIP-Seq tracks for seven samples within ~10 Kb of genomic coordinates of rs4903064. Individual sample genotypes are shown. G Venn diagram showing overlap of imbalanced H3K27ac peaks with ATAC-seq peaks in RCC. H Enrichment of risk SNPs from GWAS for RCC and other diseases in H3K27ac peaks with differential allelic imbalance, compared to all H3K27ac peaks. Empiric one-sided P value is indicated (unadjusted for multiple comparison). Statistical analysis was performed using Kruskal–Wallis test. GWAS risk SNPs rs7132434 and rs4733579 demonstrating allelic imbalance in RCC. I AI at Chr12 SNP rs7132423 and Chr8 SNP rs4733579 shown. J AI at Chr2 SNPs within EPAS1. rs46552601 and rs46541176 shown. K Chr2 SNPs within EPAS1, with imbalance plots split by histology. Adjusted Q values for imbalance are indicated. ChIP-seq chromatin immunoprecipitation sequencing, Ca cancer, HTN hypertension, Dx diagnosis, T2D type 2 diabetes mellitus, AID autoimmune disease, C0 reference allele, C1 alternate allele, AI allelic imbalance, RCC renal cell carcinoma, chRCC chromophobe RCC, ccRCC clear cell RCC, pRCC papillary RCC, SNP single-nucleotide polymorphism.
We hypothesized that chromatin AI peaks correspond to regulatory elements that are bound by master TFs. At these elements, the presence of trans-acting factors (i.e., master TFs) may result in regulatory element activity that are observable as H3K27ac AI peaks. One example is rs4903064 (chr14:73279420; 14q24.2), an eQTL for DPF358 (Fig. 5C). rs4903064, which is allelically imbalanced in our H3K27Ac dataset, has been shown to be a GWAS risk variant in all three histologic subtypes (ccRCC, chRCC, and pRCC), and the altered C-allele of this SNP has been suggested to create a HIF-binding motif (Fig. 5D)59. To confirm this, we used EPAS1 transcription-factor ChIP-Seq data in an independent cohort of 43 samples derived from 23 patients with ccRCC to study the effect of the GWAS risk variant rs4903064 (see “Methods”) on EPAS1 binding. Analysis of 11 primary, 21 metastatic ccRCC samples, and 11 normal renal tissue samples showed that tumors with homozygous C/C alleles were significantly enriched for EPAS1 peaks (Fig. 5E, F) in both tumors and normal tissue and that rs4903064 is a EPAS1 cQTL. This confirms prior literature that the C-allele creates a HIF-binding site59.
We applied our chromatin AI analysis to annotate risk SNPs identified by a GWAS for RCC58. Chromatin AI peaks were highly enriched (16.7-fold) for RCC GWAS risk variants as assessed by LD score regression analysis (P = 1.9 × 10−4)60. Subsetting imbalanced H3K27ac peaks to regions of accessible chromatin where TFs are likely to bind (as assessed by ATAC-Seq from RCC tissues61) resulted in substantial additional enrichment (43.2-fold; P = 7.0 × 10−6, Fig. 5G, H). This enrichment represents more than fivefold that of the total set of H3K27ac peaks, which are themselves enriched 8.1-fold (P = 1.2 × 10−4). By contrast, multiple other GWAS phenotypes from the UK Biobank62 showed substantially less enrichment compared to RCC GWAS SNPs, indicating the specificity of this enrichment for RCC (Fig. 5H).
Using this method, we were able to fine-map a total of 30 risk SNPs (Supplementary Data 21), with some examples highlighted. Rs7132434, located on chr 12, has been characterized as a functional variant that alters AP-1 binding leading to upregulation of BHLHE41, thus promoting tumor growth through induction of IL-1163 (Fig. 5I). Another example is rs4733579 which is located on chr 8 and is in LD with rs35252396 (Fig. 5I). The latter has been shown to contribute to RCC susceptibility through regulating MYC and PVT1 expression64. Of note, rs35252396 is an indel and therefore cannot be assessed by stratAS. Our analysis also highlighted rs46552601 and rs46541176 (Fig. 5J), both located on chr2p21 within the EPAS1 gene, where at least 59 SNPs have been fine mapped. These two SNPs are allelically imbalanced by H3K27Ac peaks only in ccRCC but not in pRCC, consistent with the specific role of EPAS1 (HIF2-α) in ccRCC pathogenesis (Fig. 5K). Further work is recommended to functionally validate the mechanism of these two SNPs in ccRCC pathogenesis. Finally, rs4765623 located in the SCARB1 locus has been recently validated in RCC progression32.
TFs can read the genetic code and bind cis-regulatory elements to activate or repress gene expression. We hypothesized that loci with significant chromatin AI are likely harbingers of TF binding and thus are associated with the regulation of gene expression in an allele-specific manner. We leveraged chromatin AI from our ccRCC H3K27Ac dataset and allele-specific expression (ASE) using RNA-seq data from the KIRC TCGA cohort. SNP loci with at least 50 reads in both the ccRCC H3K27Ac dataset and RNA-seq KIRC TCGA samples were retained. Retained SNPs were classified as chromatin allelically imbalanced or allelically balanced in ccRCC. Chromatin allelically imbalanced SNPs harbored H3K27ac peaks with a significant skew towards the alternate allele compared to the wild-type allele (P < 0.05). For the 22,162 chromatin allelically imbalanced SNPs, we matched a background set of chromatin allelically balanced SNPs lying within the H3K27ac consensus peak set. Allele-specific expression of genes whose transcription start site lies within 50 Kb of the chromatin allelically imbalanced and balanced SNPs was analyzed in the TCGA KIRC dataset (N = 412) with a significance cutoff of P-adjusted <0.01. Of the unique genes analyzed, chromatin allelically imbalanced SNPs were significantly more likely to lie within 50 kb of a gene with ASE (1170/2646, 44%) compared to the background set of chromatin allelically balanced SNPs (65/940, 6.9%, P = 8.4e−25, Supplementary Data 22–25).
Discussion
In this study, we report a compendium of ATAC-Seq, RNA-Seq, and histone modification data across the three most common RCC subtypes. Our goal was to describe enhancer programs in primary human samples from patients with RCC and to identify histology-specific epigenetic mechanisms that may underlie differential clinical features and prognosis. Critically, this resource includes chRCC, an understudied cancer type with scarce in vitro and in vivo models and molecular profiling data, and no proven therapeutic strategies for metastatic disease.
In line with histological, transcriptional and mutational data, ccRCC, pRCC, and chRCC exhibit distinct epigenetic landscapes. chRCC clearly separated from ccRCC and pRCC, both by H3K27ac and H3K4me2 landscapes. Motif-based analysis revealed the enrichment of distinct TF binding sites in H3K27ac peaks in ccRCC vs. pRCC, suggesting that distinct TFs were active and driving histology-specific regulatory pathways and downstream transcriptional programs. Using EPAS1 TF ChIP-Seq as an example, we demonstrate that histology-specific binding of EPAS1 mediates independent pathways in different RCC histologies.
Prior work in medulloblastoma and ependymoma demonstrated how molecularly-defined cancer subgroups exhibit specific core regulatory circuitries65. Herein, we nominated 50 histology-specific TFs in RCC based on a convergence of evidence, including high-level expression, specificity of expression across RCC subtypes, superenhancer association, and connectivity. Our experimental results demonstrate that upregulating a chRCC-specific TF, FOXI1, can partially reprogram a ccRCC cell line, 786-O, towards a chRCC transcriptional phenotype; consistent with results from other tumor and normal cell types showing that master TFs govern the transcriptional identity of the cell or tissue.
Our work furthermore sheds light on multiple subtype-specific master TFs, the role of which is yet be exactly determined. HNF1β, a candidate pRCC master TF, has been shown to be expressed in the embryonic kidney and in pRCC, and it is upregulated in pRCC compared to other kidney cancer histologies7,66,67. HNF1β gene is located on chromosome 17 and is frequently impacted by copy number gains in pRCC66 further providing grounds for its role in pRCC pathogenesis. ETS-1, a candidate ccRCC master TF known to have a role in cancer pathogenesis as well as angiogenesis and hematopoietic stem cell differentiation, is another example68,69,70,71. As a proto-oncogenic factor, ETS-1 is capable of activating genes associated with angiogenesis, metastasis and invasive behavior in multiple tumor types72,73,74. In addition, ETS-1 expression correlates with microvessel density in some non-glial tumors and is an independent negative prognostic marker in different tumor entities such as breast, ovarian, pancreatic, and colorectal cancers72,75,76. Nonetheless, the role of ETS-1 in RCC has mainly focused on its interaction with EPAS1 (HIF2-α)77. Here, we show that ETS-1 may play a role as a master TF specific to ccRCC and not chRCC or pRCC. Future studies should focus on understanding the global epigenomic effect of ETS1 in driving ccRCC pathogenesis.
Using clinical trial data (CheckMate 009/010/025), we suggest that high BARX2 expression may be associated with longer survival among patients treated with nivolumab but not everolimus. Prospective data in the metastatic ccRCC space are needed to confirm the role of BARX2 expression as a predictive biomarker in patients treated with immune checkpoint inhibitors.
In routine practice, pathologists use a myriad of targets to differentiate across RCC histotypes. Our approach has the potential to narrow master TFs to clinically meaningful ones and augments the current armamentarium with additional TFs, including BHLHE41 (ccRCC) and NKX6.1 (chRCC) that can be stained to better characterize tumors.
BHLHE41 was previously shown in triple-negative breast cancer to counteract expression of HIF-target genes by promoting HIF proteasomal degradation in a process independent of VHL or hypoxia78. Elevated protein expression of BHLHE41 in ccRCC has not been described previously. Using a systematic approach, we found histology-specific protein expression of BHLHE41 in ccRCC compared to pRCC and chRCC. In ccRCC where HIF activation is foundational, further investigation of the relation between BHLHE41 and HIF activation is warranted.
GWAS have identified hundreds of putative cancer-risk loci79. These studies have led to the conclusion that cancer is driven by thousands of variants with individual small effects, and that cancer-risk variants predominantly lie in non-coding regions of the genome80,81. More recently, it has been realized that GWAS heritability is enriched at variants that lie in tissue-relevant epigenomic features20,60,82,83,84. Emerging evidence suggests that many GWAS variants function by altering an existing or creating a TF binding site. This in turn regulates the local chromatin regulatory landscape by altering the enhancer landscape and thus modulating the expression of target genes85. We used our population-scale epigenomic data to further empower the discovery of cis-regulatory elements that may mediate between the risk variant and target gene and thus constrain putative targets for experimental validation. We showed that RCC GWAS risk SNPs are enriched in regulatory elements, consistent with prior studies20,60,82,83,84. Profiling these regulatory elements across multiple (genetically diverse) individuals allowed us to identify regions of chromatin allelic imbalance, which are further enriched for GWAS SNPs, and subsetting these regions to ATAC-seq peaks where TFs are likely to bind, showed even more enrichment for GWAS SNPs. In RCC, there are 13 recognized risk loci, but the causal variants have been identified for few, and underlying mechanisms of risk remain elusive for most regions58. Our approach can help pinpoint causal SNPs and provide candidate mechanisms by identifying small numbers of SNPs that may alter binding sites for specific TFs, an example of which is the rs4903064 locus. Recent work around the rs4903064 locus59 showed that it had an allele-specific effect on DPF3 expression in ACHN and HEK293T cell lines as assessed by massively parallel reporter assay, confirmatory luciferase assays, and eQTL analyses. The rs4903064-C RCC risk allele was shown to create a HIF-binding site and enhance gene expression. Increased expression of DPF3 conferred a growth advantage to cells by at least two pathways: inhibition of apoptosis via CEMIP and activation of STAT3 via IL23R. The authors also showed that DPF3-overexpressing cells showed higher T-cell mediated cytotoxicity compared to controls. In a separate effort, Protze et al.86 used 23 tumor tissue specimens and two primary ccRCC cell lines and showed that the risk SNP is located within an active enhancer region, in turn creating a EPAS1-binding motif. They also showed that HIF-mediated DPF3 regulation depends on the presence of the C-risk allele. Moreover, DPF3 deletion in proximal tubular cells decreased cell growth, suggesting a role for DPF3 in cell proliferation.