Enlibio,Your biological research partner
Xinqidi Biotech Co.,Ltd,Wuhan,China 2008-2022
R&D 14th year

Functional dissection of inherited non-coding variation influencing multiple myeloma risk

Issuing time:2022-01-17 15:49

Abstract

Thousands of non-coding variants have been associated with increased risk of human diseases, yet the causal variants and their mechanisms-of-action remain obscure. In an integrative study combining massively parallel reporter assays (MPRA), expression analyses (eQTL, meQTL, PCHiC) and chromatin accessibility analyses in primary cells (caQTL), we investigate 1,039 variants associated with multiple myeloma (MM). We demonstrate that MM susceptibility is mediated by gene-regulatory changes in plasma cells and B-cells, and identify putative causal variants at six risk loci (SMARCD3, WAC, ELL2, CDCA7L, CEP120, and PREX1). Notably, three of these variants co-localize with significant plasma cell caQTLs, signaling the presence of causal activity at these precise genomic positions in an endogenous chromosomal context in vivo. Our results provide a systematic functional dissection of risk loci for a hematologic malignancy.

Introduction

Genome-wide association studies (GWAS) have identified tens of thousands of sequence variants associated with human diseases and traits1, yet our understanding of the underlying mechanisms is still limited. Each association signal is usually represented by tens to hundreds of variants in linkage disequilibrium (LD). The vast majority of these variants map to noncoding regions of the genome, and likely act by altering gene expression2,3,4. For most signals, however, the causal variants, their target genes, and target cell types remain unknown.

Multiple myeloma (MM) is defined by uncontrolled, clonal growth of plasma cells, usually in the bone marrow. It is a common blood malignancy, with strong epidemiological support for inherited susceptibility5. While genome-wide association studies have identified 24 risk loci6,7,8,9,10,11, the causal variants remain largely unknown12,13. Further, plasma cells can be readily isolated from MM patients using routine methods, and cell lines appropriate for the investigation of MM biology exist. For these reasons, MM is an attractive model disease for deciphering the functional basis of risk loci. To our knowledge, no systematic functional dissection of risk loci for a hematologic malignancy has been reported14,15,16,17,18.

Here, we carried out an integrative study combining massively parallel reporter assays (MPRA), expression analyses (eQTL, meQTL, and PCHiC), and chromatin accessibility quantitative locus (caQTL) analyses in primary cells to investigate 1039 variants in linkage disequilibrium with multiple myeloma (MM) lead variants. We demonstrate that MM susceptibility is mediated by gene-regulatory changes in plasma cells and B-cells, and identify putative causal variants at six risk loci. Notably, three of these variants co-localize with significant plasma cell caQTLs, signaling the presence of causal activity at these precise positions in an endogenous chromosomal context in vivo. Our results provide a systematic functional dissection of risk loci for a hematologic malignancy.

Results

Designing an MPRA to screen MM risk variants

To identify putative causal variants, we first designed an MPRA14,19,20,21 to screen 1039 variants in high LD (r2 > 0.8) with MM lead variants for transcriptional activity (Fig. 1a and Supplementary Table 1). For each variant, we designed twelve 120-bp oligonucleotide sequences corresponding to reference and alternative alleles in six genomic contexts (both strands × three sliding windows with the variant at −20, 0, and +20 bp from the center). Sequences were coupled to a reporter gene with random 20-bp sequence barcodes 3′ of its open reading frame. Following transfection into cell lines, the transcriptional activity of each construct was measured by determining the barcode representation in reporter mRNA relative to DNA (Fig. 1b). Plasmid sequencing identified 1.73 × 106 unique barcodes tagging 12,378 (99.2%) of the 12,468 designed oligonucleotides (Fig. 1c). As a positive control, we included the RUNX3 variant rs188468174, which influences immunoglobulin (Ig) levels and exhibits luciferase activity across a broad range of MM cell lines22.

Fig. 1: Screening assays to identify MM risk variants for transcriptional activity.

a Manhattan plot of the largest genome-wide association study on MM to date, a meta-analysis totaling 9974 MM cases and 247,556 controls of European ancestry11. The 23 indicated loci associate with MM at P < 5 × 10−8. The 11q13.3 (CCND1) locus specifically associates with risk of t(11;14)[IGH/CCND1] translocation MM56. Lead variants at each locus are detailed in Supplementary Table 1. b We employed MPRA to screen variants in high LD (r2 > 0.8) with MM lead variants for transcriptional activity. For each variant, we designed twelve 120-bp oligonucleotide sequences representing the reference and alternative alleles in six genomic contexts (positive and negative DNA strand × three sliding windows with the variant positioned at −20, 0, or +20 bp from the center). The synthesized oligonucleotides were coupled to a synthetic reporter gene with 20-nt random sequence barcodes 3′ of its open reading frame. c Sequencing of the final plasmid library identified 1.73 × 106 barcodes mapping to 12,378 of the 12,468 designed oligonucleotides. The histogram shows the numbers of barcodes representing each oligonucleotide (median 103). dFollowing transfection of the library into cell lines, the transcriptional activity of each construct was measured by quantifying the barcode representation in reporter mRNA relative to DNA by sequencing. Barcode activity was quantified as log2(1+#RNAi)/(1+#DNAi) where #RNAi and #DNAiare the read counts for barcode i normalized to counts per 10 million reads. We performed MPRA in three replicates in each cell line. The plot shows the correlation of barcode activity estimates between two L363 replicates.

Identification of causal cell types for MM susceptibility

Since reporter assays can show cell type-dependent activity, MPRA should ideally be performed in an appropriate cellular model. We therefore carried out computational analyses to identify cell types where MM risk variants likely act. First, using ATAC-seq data for blood cell populations23, we found an enrichment of risk variants in genomic regions with accessible chromatin in plasma cells and total mature B-cells (Supplementary Fig. 1). Second, investigating blood cell populations for expression23,24,25,26,27,28 of genes located at MM risk loci, we identified plasma cells and total mature B-cells as the most enriched cell types (Supplementary Fig. 2). Third, using gene expression profiles of CD138+ plasma cells isolated from the bone marrow of 2650 MM patients12,29,30,31, we identified cis-eQTLs in LD with ten risk alleles (Supplementary Table 2). Additional cis-eQTLs were found in whole blood (Supplementary Table 3) or in CD19+ total mature B-cells isolated from 758 random blood donors (Supplementary Table 4). Fourth, since plasma cells are responsible for producing Ig, we tested MM lead variants for association with blood IgA, IgG, and IgM levels22. This revealed enrichments of association signal within the set of 24 MM lead variants for all three Ig isotypes (binomial test P = 6.8 × 10−5 for IgA, P = 0.02 for IgG, P = 0.004 for IgM for the enrichment of association P values <0.05), as well as individually significant associations (Supplementary Fig. 3), including for the SMARCD3, WAC, and ELL2associations, which also showed plasma cell cis-eQTLs (Supplementary Table 2). Collectively, these data are consistent with many MM risk variants acting by altering gene regulation in plasma cells, while others may act in other cell populations, including B-cells.

Identification of MM risk variants influencing transcription

Focusing our analysis on plasma cells, we performed MPRA in the MM plasma cell lines L363 and MOLP8. Each cell line was assayed in three replicates (Fig. 1d). Based on barcode activity estimates, we calculated a log2 score for each variant reflecting the transcriptional activity of the alternative relative to the reference allele, averaged across genomic contexts and replicates32. L363 and MOLP8 scores showed a positive correlation (Fig. 2a), did not display strand bias (Fig. 2b), and additional validation of 20 selected variants showed a positive correlation with luciferase data (Fig. 2c, Supplementary Fig. 4 and Supplementary Table 5). Moreover, variants with strong MPRA scores were enriched in chromatin accessibility regions in primary plasma cells, consistent with our assay selecting variants with endogenous regulatory activity (Fig. 2d and Supplementary Fig. 5).

Fig. 2: Overarching analysis of screening data.

We performed MPRA in the MM plasma cell lines L363 and MOLP8. a Variant log2 scores for the L363 and MOLP8 screens. For each variant, log2 reflects the transcriptional activity of the alternative relative to the reference allele. Scores were calculated based on barcode activity estimates in all six genomic contexts (i.e., across both strands and all three sliding windows) and three replicates per cell line. Variants with strong effects (absolute log2 score >0.2) in either screen are indicated in red. Pearson r and two-sided P values are shown. b Calculating log2 scores using either positive-strand (x-axis) or negative-strand constructs (y-axis) for the same variant, we did not observe strand bias. cAs an additional assay validation step, we carried out luciferase experiments for 20 variants, showing a significant positive correlation between the MPRA effect (x-axis) and the luciferase effect (y-axis). d g-chromVAR analysis of screened variants, weighted by their L363 log2 scores showed enrichment of variants with strong MPRA scores in genomic regions with accessible chromatin in plasma cells, consistent with our assay selecting variants with endogenous regulatory activity. eNumbers of significant variants with FDR <5% in the two cell lines. f Numbers of variants showing both FDR <5% and strong effects (absolute log2 score >0.2).

In L363, 142 variants were significant (FDR <5%), including 33 with strong effects (absolute log2 score >0.2). In MOLP8, 28 were significant, including 21with strong effects (Fig. 2e, f and Supplementary Data 1). The higher number of significant variants in L363, compared to MOLP8, cells was congruent with a higher transfection efficiency (54% for L363 versus 15% for MOLP8) and higher post-transfection viability (90% for L363 versus 65% for MOLP8). In total, 23 variants were significant in both screens, and eight of these showed concordant plasma cell cis-eQTLs, making them putative causal variants that were selected for follow-up (Table 1, Fig. 3, and Supplementary Figs. 4, 6). The other 15 had discordant or no plasma cell cis-eQTLs, either because of technical limitations (e.g., TERC was not in our eQTL data; the JARID2 and RUNX3 variants are rare), or because these alter gene expression in another cell state (e.g., TNFRSF13B is primarily expressed in switch-memory B-cells33 and had a cis-eQTL in total mature B-cells; Supplementary Table 4).

Table 1 Variants showing FDR <5% in both L363 and MOLP8 cells, with MM risk alleles underlined.
Fig. 3: MPRA data for identified variants.

Individual MPRA barcode activity estimates for the eight variants in Table 1 also showed concordant plasma cell cis-eQTLs. The data have been grouped by allele (reference allele to the left; an alternative to the right), DNA strand (+ or −), and sliding window (variant at −20, 0, or +20 bp from the center of the 120-bp oligonucleotide representing the genomic context). Blue dots represent individual barcode activity estimates. The bottom, middle, and top of each box plot represent the 25th, 50th, and 75th percentiles. The whiskers represent the non-outlier minimum and maximum values, located at 1.5 times the interquartile range from the bottom and top of the box, respectively. The numbers by the brackets are P values for the two-sided Student’s t-test. The luciferase validation data for these eight variants are shown in Supplementary Fig. 4. The individual barcode activity estimates for MOLP8 cells, as well as individual barcode activity estimates for the other variants in Table 1, are shown in Supplementary Fig. 6.

Functional characterization of MPRA-functional variants

We next investigated potential mechanisms of action for the selected variants. rs78740585 maps to SMARCD3 (Fig. 4a). In humans, SMARCD1, SMARCD2, and SMARCD3 encode alternative, mutually exclusive 60-kD subunits of the SWI/SNF nucleosome remodeling complex34,35,36. Incorporation of either SMARCD subunit variant into the complex influences its activity37. In blood, SMARCD3 is primarily expressed in granulocytes and monocytes whereas basal expression in plasma cells is very low; instead these cells exhibit high expression of SMARCD1 and SMARCD2 (Fig. 5a). By contrast, the MM risk allele associates with upregulation of SMARCD3 in plasma cells (Supplementary Tables 2 to 4). rs78740585-A creates a binding site for IRF4 (Fig. 5b, c, Supplementary Fig. 7, and Supplementary Data 2), a key plasma cell transcription factor essential for the survival of MM cells38. Knockdown of IRF4 attenuated rs78740585-A luciferase activity (Fig. 5d, e). Furthermore, analysis of promoter-capture Hi-C (PCHi-C) data for three MM plasma cell lines showed a chromatin looping interaction between the rs78740585 region and the SMARCD3 promoter (Fig. 4a and Supplementary Fig. 8a). Collectively, these data are consistent with rs78740585-A effecting ectopic SMARCD3 expression in plasma cells by introducing a new IRF4 site into an enhancer, In theory, increased levels of SMARCD3 protein could lead to the displacement of SMARCD1 and SMARCD2 in the SWI/SNF complex through stoichiometric competition, potentially impacting on SWI/SNF-dependent gene expression.

Fig. 4: Genomic context of identified putative causal variants.

Based on our functional screens, we identified eight putative causal variants (highlighted in red and with dashed lines) across six loci. The figure shows their association P values (Fig. 1a), with the lead SNP indicated as a triangle, along with ATAC-seq data for plasma cells (blue) and 11 ChromHMM states in the MM plasma cell line KMS11. We also generated PCHi-C data for the MM plasma cell lines KMS11 (yellow), KMS12 (orange), and MM1S (red), and identified chromatin looping interactions using the CHiCAGO tool. Interactions with −log10(CHiCAGO P score) ≥2 are shown. a At the SMARCD3 locus, we detected a chromatin looping interaction between the rs787404585 region and the SMARCD3 promoter. b rs2790444 at WAC, located close to the promoter within the PCHi-C bait region. c rs3777189 and rs3777183-rs3777182 at ELL2, where rs3777182 and rs3777183 are located only 17 bp apart. We detected a chromatin looping interaction between the rs3777183-rs3777182 region and the promoter. d No looping interactions were detected at CDCA7L. e At the PREX1 locus, we detected a looping interaction between the rs6066832 region and the PREX1 promoter. f No looping interactions were detected for the CEP120 association. Vertical lines indicate variant positions. Coordinates are hg38.

Fig. 5: Characterization of rs78740585.

a Heat map showing the expression of the SMARCD gene family in blood cells. Notably, expression of SMARCD3 in plasma cells is normally very low; the MM risk allele upregulates SMARCD3 in this cell type (Supplementary Table 2). The color scale is log2-transformed, median-centered RNA-seq data23. b Motif analysis predicted that the SMARCD3 high-expressing rs78740585-A allele creates a binding site for IRF4. Arrow indicates the altered recognition base. c Electromobility shift assay showing selective binding of IRF4 to rs78740585-A probe. Arrow indicates supershift with an antibody towards IRF4. d siRNA-knockdown of IRF4 reduced luciferase activity for rs78740585-A relative to rs78740585-G in L363 cells. The y-axis indicates the log2 ratio of the luciferase/renilla signal for rs78740585-A relative to the rs78740585-G construct. The bottom, middle, and top of each box plot represent the 25th, 50th, and 75th percentiles. The whiskers represent the non-outlier minimum and maximum values, located at 1.5 times the interquartile range from the bottom and top of the box, respectively. The P value is for the two-sided Student’s t-test. e Western blot confirming knockdown. LP labeled probe, NE L363 nuclear extract, ULP unlabeled probe, α-IRF4 antibody against IRF4.

rs2790444 maps to the autophagy gene WAC (Fig. 4b). Rare loss-of-function variants in WACcause De Santo-Shinawi syndrome39, which can feature hypogammaglobulinemia40. The common MM risk allele associates with increased levels of IgM in the blood (Supplementary Fig. 3) and downregulation of WAC in plasma cells (Supplementary Table 2). rs2790444 maps close to the WAC transcription start site, within the PCHi-C bait region (Fig. 4b and Supplementary Fig. 8b). We found that rs2790444-T creates a binding site for the POU2F1 transcription factor and knockdown of POU2F1 attenuated rs2790444-T luciferase activity (Fig. 6a–d, Supplementary Fig. 9, and Supplementary Data 2). POU2F1 has a dual role in the regulation of gene expression; recruiting the nucleosome remodeling and deacetylase (NuRD) complex, POU2F1 promotes methylation and suppressive histone modifications, while in the context of MAPK signaling it recruits the KDM3A demethylase, promoting pro-transcriptional effects41. Consistent with pro-transcriptional activity, we detected a significant plasma cell cis-meQTL at WAC with rs2790444 (P = 1.37 × 10−8), with rs2790444-T being associated with reduced methylation (Supplementary Table 6). Moreover, CRISPR/Cas9 deletion of a 139-bp region harboring rs2790444 downregulated WAC, supporting functional coupling between the variant-harboring region and the transcriptional regulation of WAC (Fig. 6e, f). These data are compatible with rs2790444-T creating a promoter-proximal POU2F1 site, upregulating WAC through decreased methylation.

Fig. 6: Characterization of rs2790444.
figure6

a Motif analysis predicted that the WAC high-expressing allele rs2790444-T creates a new binding site for POU2F1. Arrow indicates the altered recognition base. b Electromobility shift assay showing selective binding of POU2F1 to rs2790444-T probe. Arrow indicates supershift with an antibody towards POU2F1. c siRNA-knockdown of POU2F1 attenuated luciferase activity for rs2790444-T relative to rs2790444-C in L363 cells. The y-axis indicates the log2 ratio of the luciferase/renilla signal for rs2790444-T relative to the rs2790444-C construct. The bottom, middle, and top of each box plot represent the 25th, 50th, and 75th percentiles. The whiskers represent the non-outlier minimum and maximum values, located at 1.5 times the interquartile range from the bottom and top of the box, respectively. d Western blot confirming knockdown. e Quantitative PCR showing expression of WAC relative to control in MOLP8 cells upon dual-sgRNA CRISPR/Cas9 deletion of a 139-bp region harboring rs2790444, heterozygous for rs2790444. Blue bars indicate the averages of the individual measurements. f Agarose gel confirming deletion of the targeted region. LP labeled probe, NE L363 nuclear extract, ULP unlabeled probe, α-POU2F1 antibody against POU2F1. The P values is for the two-sided Student’s t-test.

rs3777182, rs3777183, and rs3777189 map to ELL2 encoding a key protein in the super-elongation complex that drives Ig synthesis42,43,44,45 (Fig. 4c). The MM risk allele downregulates ELL2 in plasma cells and Ig levels in blood8,12. Recently, we nominated rs3777189 as causal using non-systematic approaches and demonstrated that it changes a MAFF/G/K binding site12. In our MPRA screen, we now identify rs3777189 as the most active variant within its LD block, providing additional, unbiased evidence for causality. In addition, we identify rs3777182 and rs3777183 as previously unappreciated regulatory variants within the ELL2 LD block. Analyzing our PCHi-C data, we identified a chromatin looping interaction between the rs3777183-rs3777182 region and the ELL2 promoter (Fig. 4cand Supplementary Fig. 8c), and predicted several altered motifs (Supplementary Data 2). CRISPR/Cas9 deletion of a 141-bp region harboring rs3777183-rs3777182 and an 89-bp region harboring rs3777189 both altered ELL2 expression, supporting that the ELL2 eQTL is caused by genetic variation in multiple intronic regulatory elements that are involved in the transcriptional regulation of ELL2 (Fig. 7a, b and Supplementary Fig. 10a, b).

Fig. 7: Deletion data for rs3777182, rs3777183, and rs3777189 at ELL2 and rs4487645 at CDCA7L.
figure7

a Quantitative PCR data showing altered expression relative to control of ELL2 upon dual-sgRNA CRISPR/Cas9 deletion of a 141-bp region harboring rs3777182 and rs3777183 in RPMI-8226 cells, which are heterozygous the rs3777189, rs3777182, and rs3777183 variants. The agarose gel below confirms the deletion of the CRISPR/Cas9-targeted region. b Corresponding data for an 89-bp region harboring rs3777189. c Corresponding data for a 76-bp region harboring rs4487645 in OPM2 cells, which are homozygous for the rs4487645-C variant. The P values are for the two-sided Student’s t-test. Blue bars in (a) through (c) indicate the averages of the individual measurements. dWe successfully edited rs4487645[C > A] in L363 cells using CRISPR-HDR. We generated six C-homozygous, three C/A heterozygous, and six A-homozygous single-cell clones. Consistent with the other data for rs4487645, we detected an association between CRISPR-edited rs4487645 genotype and CDCA7L expression, with the C allele yielding higher expression, further supporting causality. The y-axis indicates residual CDCA7L expression qPCR expression value in L363 cells, taking into account the CDCA7L copy number in each single-cell clone. The bottom, middle, and top of each box plot represent the 25th, 50th, and 75th percentiles. The whiskers represent the non-outlier minimum and maximum values, located at 1.5 times the interquartile range from the bottom and top of the box, respectively. The P value is for correlation between edited genotype and CDCA7Lexpression, taking CDCA7L DNA copy number into account as a covariate.

rs4487645 maps to the DNAH11-CDCA7L locus (Fig. 4d and Supplementary Fig. 8d), and the risk allele rs4487645-C upregulates the cMyc-interacting CDCA7L29,46. We previously proposed rs4487645 as causal, finding that rs4487645-C creates a new IRF4 binding site13. Our current analysis provides additional unbiased evidence for this variant indeed being the functional basis of the 7p15.3 association. CRISPR/Cas9 deletion of a 76-bp region harboring rs4487645 downregulated CDCA7L (Fig. 7c and Supplementary Fig. 10c), supporting a regulatory link between the region and CDCA7L. Moreover, we employed CRISPR/Cas9 with homology-directed repair (HDR) to generate L363 single-cell clones with different rs4487645 genotypes. In total, we generated six rs4487645-C-homozygous clones, three rs4487645-C/A heterozygous clones, and six rs4487645-A-homozygous clones. We observed a significant association between rs4487645 genotype and CDCA7L expression, with the C allele yielding higher expression (Fig. 7d), further supporting that rs4487645 causes the CDCA7L eQTL.

Finally, rs11960493 and rs6066832 map to CEP120 and PREX1, respectively, and upregulate these genes in plasma cells (Fig. 4e, f and Supplementary Fig. 8e, f). CEP120 is implicated in microtubule assembly47, and PREX1 encodes a guanine nucleotide exchange factor mutated or aberrantly expressed in several cancers48,49. While we predicted several motif changes for both variants (Supplementary Data 2) and a looping interaction between the rs6066832 region and the PREX1 promoter (Fig. 4e and Supplementary Fig. 8f), we could not identify differentially bound proteins.

Effects in the endogenous chromosomal context in vivo

Following characterization in vitro, we investigated if the eight selected variants are active in an endogenous chromosomal context in vivo. Altered gene-regulatory activity is associated with the release or recruitment of proteins to DNA and/or changes in chromatin structure. In turn, this could cause allele-dependent changes in accessibility (chromatin accessibility quantitative trait loci, caQTLs) around the variant, detectable by ATAC-seq of limited numbers of primary cells. Hence, we performed ATAC-seq on plasma cells from MM patients. To detect caQTLs, we estimated the local ATAC-seq signal intensity as the average Tn5 transposase cut-site density across a 150-bp sliding window positioned at every 10 bps across LD regions and examined correlations with the MM lead variant. We also developed a segmentation algorithm (“caQTLseg”) to partition LD regions into subregions with either allele-dependent or allele-independent accessibility.

In an initial set of 56 ATAC-seq samples, we detected lead variant caQTLs at SMARCD3, CDCA7L, and CEP120 (Supplementary Fig. 11). For replication, we performed ATAC-seq on an additional 105 samples. In a combined analysis of all 161 samples, the three caQTL signals increased in significance (Fig. 8). The regions identified as having allele-dependent accessibility were identified with a broad range of caQTLseg parameter settings (Online Methods and Supplementary Figs. 12 and 13). Furthermore, the SMARCD3 and CDCA7Lsignals were centered at rs78740585 and rs4487645, and were the only LD variants within their caQTLs; both of these risk variants create new IRF4 binding sites. Consistent with the recently described role of IRF4 as a pioneer-like transcription factor that regulates chromatin accessibility50,51,52,53, the SMARCD3- and the CDCA7L-high-expressing MM risk alleles associated with increased accessibility at rs78740585 and rs4487645 (Fig. 8a, b). By contrast, the caQTL at CEP120 (Fig. 8c) was more complex, encompassing rs11960493 plus eight other LD variants, one of which (rs62376437) was borderline-significant in the MPRA (qvalue 7.57 × 10−6 in L363; 2.72 × 10−1 in MOLP8; Supplementary Data 1) and concordant with the CEP120 cis-eQTL, suggesting multi-variant causality, as in the case of the ELL2association. These results demonstrate that three of our selected MPRA-functional variants co-localize with significant plasma cell caQTLs, signaling the presence of causal regulatory activity at these variants in an endogenous chromosomal context in vivo.

Fig. 8: Identification of co-localized caQTLs at MPRA-functional variants.

We performed ATAC-seq on plasma cells from 161 MM patients and scanned the LD regions for lead variant caQTLs using two computational approaches. a In the SMARCD3 region, we detected a significant caQTL around rs78740585, with the minor allele conferring increased accessibility. Consistent with this, rs78740585[T > A] showed a positive MPRA log2 score and the rs78740585-A risk allele creates an IRF4 site (Supplementary Fig. 4). b In the CDCA7L region, we detected a significant, 1.6 kb wide caQTL around rs4487645, with the major allele conferring increased accessibility. Consistent with this, rs4487645[C > A] showed a negative MPRA log2 score and the rs4487645-C risk allele creates an IRF4 site. c At CEP120, we detected a significant caQTL covering rs11960493 and eight other variants, including rs62376437 which was borderline-significant in MPRA, suggesting the CEP120 association is enshrined in multiple causal variants. Dashed blue indicates a false discovery rate (−log10 Q value) for Pearson correlation between ATAC-seq signal intensity and lead variant genotype. Regions with lead variant-dependent accessibility called by caQTLseg are indicated in light blue. Upper panels show full regions of LD, lower panels are close-ups of highlighted regions. Red circles indicate variants that show evidence of association with MM (data from Fig. 1a; variants with P < 10−5 for association shown). In the lower panels, average local ATAC-seq signal intensity across individuals with different lead variant genotypes is indicated by the yellow (minor/minor), orange (minor/major), and red (major/major) lines.

Discussion

We have carried out a systematic functional analysis of inherited noncoding variants that predispose for MM. Our analysis represents a functional dissection of inherited noncoding variation predisposing for a hematologic malignancy. To our best knowledge, MPRA and caQTL analysis have not been previously used as mutually complementary approaches to identify putative causal variants. While MPRA is a powerful in vitro screening approach, caQTLs provide evidence for causal regulatory activity at specific genomic positions in an endogenous chromosomal context in vivo.

Our analysis identifies eight putative causal regulatory variants at six risk loci: SMARCD3, WAC, ELL2, CDCA7L, CEP120, and PREX1. Out of these variants, seven map to intronic regions within their target genes, and one maps to an enhancer region within a neighboring gene (Fig. 4). These observations are in accordance with other studies where GWAS signals have been dissected functionally (c.f., refs. 14,16,19,23,54,55). Notable findings include a variant effecting ectopic expression of the SWI/SNF gene SMARCD3 in plasma cells by introducing a new IRF4 site into an enhancer, and a variant upregulating the autophagy gene WAC by creating a POU2F1 site. Additionally, we find evidence for multi-variant causality at ELL2 and CEP120, and further support for rs4487645 being a causal variant at CDCA7L. Collectively, our findings provide functional insight into the genetic architecture of MM predisposition.

Regarding limitations, functional dissection of a GWAS signal should ideally include systematic perturbation of each variants within the LD block, for example using CRISPR-HDR or base editors (to replace each reference allele with its corresponding variant allele or vice versa in situ). However, it is widely recognized that such an approach is currently not possible, both because of the workload and because only some variants are accessible to precision editing because of the lack of nearby sgRNAs, and base editors can only achieve certain types of base changes. Additionally, in the case of MM, it is not possible to culture primary plasma cells or primary multiple myeloma cells ex vivo, and thus any editing experiments will need to be done in cell lines. For these reasons, we instead followed up our MPRA screen with dual-sgRNA CRISPR/Cas9 experiments to link variant-harboring regions to eQTL target genes. We achieved successful editing of rs4487645 at CDCA7L, whereas precision editing was not achieved for the other variants of interest. Finally, we carried out caQTL experiments in primary MM plasma cells, demonstrating allele-dependent chromatin accessibility (as a sign of altered regulatory activity) at the positions of theSMARCD3, CDCA7L, and CEP120 MPRA-functional variants in an endogenous chromosomal context.

Deciphering the functional basis of cancer risk variants provides for a more comprehensive understanding of the biological networks underlying tumorigenesis and predisposition. Here we have addressed this challenge in the context of MM by combining information from high-throughput functional screens, QTL analyses, and additional assays. Our integrative approach illustrates how functional dissection of noncoding variation influencing the development of human malignancies can be undertaken.


Article classification: Biological abstract
Share to:
Tel:+86-027-87610298
Tel:+86-027-87610297
Add:Room A11-329, 1st Floor, No.1, SBI Venture Street, Optics Valley, East Lake
New Technology Development Zone, Wuhan, China.
Certificate NO.:U18Q28010569R0S