Xinqidi,Your biological research partner
Xinqidi Biotech Co.,Ltd,Wuhan,China 2008-2021
R&D 13th year

Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interacti

Issuing time:2021-10-22 10:42


In the past decades, transcriptomic studies have revolutionized cancer treatment and diagnosis. However, tumor sequencing strategies typically result in loss of spatial information, critical to understand cell interactions and their functional relevance. To address this, we investigate spatial gene expression in HER2-positive breast tumors using Spatial Transcriptomics technology. We show that expression-based clustering enables data-driven tumor annotation and assessment of intra- and interpatient heterogeneity; from which we discover shared gene signatures for immune and tumor processes. By integration with single cell data, we spatially map tumor-associated cell types to find tertiary lymphoid-like structures, and a type I interferon response overlapping with regions of T-cell and macrophage subset colocalization. We construct a predictive model to infer presence of tertiary lymphoid-like structures, applicable across tissue types and technical platforms. Taken together, we combine different data modalities to define a high resolution map of cellular interactions in tumors and provide tools generalizing across tissues and diseases.


Transcriptomic and genetic studies have revolutionized our understanding of cancer and led to the development of new diagnostic and therapeutic tools. Recent advances in single-cell RNA sequencing (scRNA-seq) technologies have provided extensive insight into the cellular composition of tumors1,2,3. Building on decades of clinical and pre-clinical research, numerous scRNA-seq studies have analyzed tumor-associated cells, some of which mainly focused on cancer cells4,5,6,7,8,9,10,11,12, immune cells13,14,15,16,17,18,19, or fibroblasts20,21,22. In these studies, cell types were split into finer distinctions based on their molecular profiles. Together, these cellular subsets form complex ecosystems that continuously interact with each other to govern tumor progression and treatment outcome; yet many fundamental mechanisms regarding how and where tumor-associated cell subsets interact remain unresolved.

Although scRNA-seq offers a high-throughput analysis of cells’ transcriptomes, their spatial context is lost during tissue processing. In contrast, techniques such as immunohistochemistry (IHC) and in situ hybridization provide high spatial resolution, but often require a priori selection of targets, making these methods less suited for high-throughput exploratory analyses. In addition, recent work has shown that most tumor-associated cell states are complex and not easily defined by a few marker genes or surface receptors2, which presents both a technical and financial challenge to targeted spatial methods23. Therefore, Spatial Transcriptomics (ST), developed by Ståhl and Salmén et al.24, which provides spatially resolved and transcriptome-wide expression information, is highly suitable to investigate cell interactions and spatial aspects of gene expression in the tumor stroma.

Breast cancer remains an important public health concern, claiming more than a hundred lives every day in the US alone, and its often arduous treatment takes a significant toll on patients25. Breast cancer is divided into several subtypes, including HER2-positive tumors, which are defined by an enrichment of the HER2 (human epidermal growth factor receptor 2) expression by tumor cells26,27. An estimated 15–20% of all breast cancers tumors are HER2-positive, and these tumors usually exhibit aggressive growth and demand intense treatment28,29. Research into the molecular underpinnings governing breast tumor progression has yielded considerable clinical benefits for HER2-positive breast cancer patients; most notably, the introduction of HER2-targeted therapies that has drastically improved disease outcome30. Despite this progress, many patients with metastatic HER2-positive breast tumors still succumb to the disease due to primary or acquired resistance to anti-HER2 therapies31. Alternative strategies to treat HER2 cancer patients are therefore needed.

In the past decade, drugs targeting the immune system have increased patient survival in several different cancers32. Traditionally, breast cancer has not been considered an immunogenic disease, but an increased presence of tumor-infiltrating lymphocytes has consistently been associated with a favorable breast cancer prognosis33. Elevated tumor-associated lymphocyte levels are observed in triple-negative (TNBC) and HER2-positive breast cancer, suggesting that they are more immunogenic compared to other breast cancer subtypes34. Furthermore, tertiary lymphoid structures (TLSs), which are lymph-node-like structures that can form ectopically in tissues such as tumors, holds certain predictive power of treatment outcome in HER2-positive tumors35,36,37. However, to date, immune checkpoint blockade treatment has only been clinically approved for TNBC. There is therefore a need to understand fundamental mechanisms that define HER2-positive tumors’ molecular profile and cell composition.

In this study, we seek to deepen our understanding of the spatial aspects of tumor-associated cellular relationships by applying ST to human breast cancer. More specifically, we use ST to survey spatial-gene expression and cell types in 36 samples collected from eight HER2-positive individuals. We examine intra- and interpatient heterogeneity using a number of different methods, including expression-based clustering and single-cell data integration. We here define: (i) shared spatial expression signatures among HER2-positive patients, (ii) high-resolution cell state colocalization patterns that expose a type I interferon-associated macrophage T-cell interaction, and (iii) a method to identify putative tertiary lymphoid-like structures in ST data.


HER2-positive tumors from eight individuals (patient A-H) were subjected to ST with three (adjacent) alternatively six (evenly spaced) sections obtained from each tumor (n = 36 sections) (Methods and Supplementary Figure 1). For brevity, we will refer to sections originating from the same individual as replicates. Figure 1 provides an overview of the analysis workflow and methods.

Fig. 1: Overview of study.

After sample retrieval, we performed ST (Spatial Transcriptomics) on 36 sections confirmed to be HER2-positive. A trained pathologist manually annotated one section from each sample. Expression-based clustering and single-cell data integration were applied to explore the spatial expression profiles and cell type interactions in our data. Marker genes were extracted for each of the clusters and subjected to functional enrichment analysis, which allowed us to biologically annotate them. By deconvolving the expression profiles in each spot with the single-cell data, we could infer patterns of cell state colocalization and design a model for the prediction of tertiary lymphoid-like (TL-like) structures. The single-cell data and its associated cell annotations originate from an external source also examining HER2-positive tumor samples.

Manual annotation and initial data characterization

One section from each tumor was examined and annotated by a pathologist (A.E.) based on the morphology of the associated HE-image (Hematoxylin and Eosin). Regions were labeled as either: in situ cancer, invasive cancer, adipose tissue, immune infiltrate, or connective tissue (Fig. 2A and Supplementary Figure 2).

Fig. 2: Results from the expression-based analysis of patient G.

A Morphological regions were annotated by a pathologist into six distinct categories: adipose tissue (cyan), breast glands (green), in situ cancer (orange), connective tissue (blue), immune infiltrate (yellow), and invasive cancer (red). B Split view of each expression-based cluster’s distribution across one tissue section. C UMAP embedding of all spots (n = 446) from the three replicates, markers are colored based on cluster identity. D Proportions of spots assigned to each cluster across the three replicates. E Dot plot showing the overlap between clusters and annotated regions. The size of the dots represents the proportion of cluster spots belonging to an annotated region. The pathologist’s annotations are given on the x axis, cluster annotations on the y axis. F. Heatmap of the clusters and the most highly differentially expressed genes. Each cluster was annotated based on its association with morphological regions and marker genes. Heatmap colors represent normalized and scaled expression values. APC stands for antigen-presenting cell.

To explore the spatial-gene expression data, we first applied common techniques of normalization, and visualized it in 2D-space using UMAP (Methods and Supplementary Figure 3)38. Spatial capture locations (hereafter, spots) from different patients separated into isolated clusters, with a high degree of intermixing between each patient’s replicates. This clustering pattern implies the presence of interpatient heterogeneity, as can be expected when working with tumor samples39. In scRNA-seq data, immune cells, but not tumor cells, typically mix between patients16. In ST, multiple, often different, cell types contribute to each spot’s transcription profile (~0–200 cells/spot); therefore, even immune-rich spots tend to separate in a patient-specific manner40. Thus, to properly capture the nuances of each patient’s molecular profile and not risk quenching weak signals, we initially analyzed each patient separately.

Expression-based clustering

For the patient-wise analysis, we split the data into mutually exclusive patient subsets, each being independently normalized with the intention to remove technical noise and batch effects. Next, we clustered our data using Seurat’s shared nearest neighbor approach (clusters are referred to as pXcY; X = patient id, Y = cluster id)41. Spots neighboring in physical space were often assigned to the same cluster, and the clusters aligned with the pathologist’s annotations (Fig. 2A–C and Supplementary Figure 2). Furthermore, the spatial arrangement of clusters and the fraction of spots in each cluster were consistent across replicates (Fig. 2D, Supplementary Figure 35).

Cluster annotation

To better understand what biological entities the expression-based clusters represented, we performed differential gene expression analysis, resulting in a set of marker genes upregulated within and characteristic of each cluster (Fig. 2F). The marker genes were selected by a combined cutoff with respect to their adjusted p value and fold change (Methods and Supplementary Data 1). To assess enrichment of biological pathways, we queried the clusters’ marker genes against the GO:BP (GO Biological Processes) database using g:Profiler42. Based on the set of marker genes and their associated pathways, we annotated the expression-based clusters (Supplementary Data 2); these annotations are not exhaustive but serve to provide guidance for subsequent analysis.

Comparison of clusters with pathologist-annotated regions

We related the pathologist-annotated regions to the expression-based clusters by computing the fraction of spots within a region that belonged to each cluster (Fig. 2E). Strong concordance was observed; clusters enriched for immune-related processes overlapped well with the immune infiltrates and clusters with cancer-associated pathways fell into the cancerous regions (Supplementary Data 3). This comparison established an agreement between the pathologist’s annotations and the clusters defined by our data-driven approach. Notably, an in situ cancer region that consisted of only three spots correctly clustered with physically separated, but identically annotated spots (pGc4 in Fig. 2B and Supplementary Figure 6), attesting to the sensitivity of ST and verifying that clusters are not only driven by physical proximity. In patient H, parts of a region labeled as in situ cancer were consistent, across all replicates, inhabited by two expression-based clusters; one (pHc1) that overlapped with the other cancer areas while the other (pHc4) was enriched for immune processes and aligned spatially with the annotated immune infiltrates (Supplementary Figure 6). These observations suggest that data-driven expression-based clustering captures signals that may be overlooked by visual inspection; thus in some cases providing a more in-depth and nuanced depiction of the tissue.

Exploring intra- and interpatient heterogeneity

Multiple tumor profiles may exist among a group of individuals, but also within a given patient. Characterization of this heterogeneity could aid in the design of more nuanced and personalized treatment regimens43. Therefore, we used the expression-based clusters as a framework to examine intra- and interpatient heterogeneity in our data.

Interestingly, we observed intrapatient heterogeneity at the transcriptome level in most of our patients; all patients except two (patient B and H) had more than one cluster labeled as cancer. Given the HER2-receptor’s prominent role in the etiology of the HER2-subtype, we also examined the ERBB2 (encoding the HER2-receptor) expression across the cancer clusters. We observed significant differences in ERBB2 gene expression (two-sided Mann–Whitney U test, padj < 0.05) between clusters in the same patient, attesting to a certain spatial heterogeneity of ERBB2 expression (Supplementary Figure 7 and Supplementary Table 1). Patient E exhibited varying transcription profiles in two spatially separated tumor foci assigned to different clusters (pEc3 and pEc4) (Supplementary Figure 2). While such observations could be a consequence of “overclustering”, the two clusters were separated and non-neighboring in UMAP-space, which suggested distinct expression profiles. Both clusters had ERBB2 listed as a marker gene and were enriched for pathways associated with cell growth, but one of the clusters (pEc3) displayed a high degree of enrichment for immune response-related processes (e.g., humoral immune response, padj = 4.0 × 10−5; immune system process, padj = 8.0 × 10−5) while apoptotic and regulatory pathways were enriched in the other (pEc4: e.g., cell death, padj = 6.5 × 10−9; apoptotic processes, padj = 1.6 × 10−8) (Supplementary Data 4). These findings implied that one tumor focus (pEc3) likely had a higher degree of infiltrating immune cells than the other. Thus, ST enables us to further spatially demarcate regions with distinct molecular profiles that have similar morphology.

Immune and tumor core signatures

To assess the presence of universal features in our data, we compared clusters across patients with respect to their marker genes, selected as described above. We reasoned that if two clusters shared a large number of marker genes, they should be considered more similar than if the converse was true. To translate this notion of similarity into a quantitative metric, we computed the Jaccard Index for every combination of cluster pairs. Five distinct supergroups (a.k.a, “clusters of clusters”) emerged after the clusters were hierarchically grouped based on similarity (Methods and Supplementary Figure 8). Only supergroups with at least one shared gene among the members were considered robust, three of the five supergroups fulfilled this criterion. In the remaining three supergroups, marker genes present in a majority of the member clusters (at least 80%) were considered representative core signatures. Two core signatures were immune-related: the first being a set of 47 genes, including APOE and C1Q{A,B,C} highly expressed by macrophages (Mø), suggesting that clusters in the corresponding supergroup might contain tumor-associated Mø16; the second was a set of 55 genes with lymphocyte and MHC class I–II-associated members (e.g., TRBC1, HLA-{A,B}, HLA-D{QB1,RA,RB1})44. The third core signature consisted of 11 genes, with several of them being related to cancer and proliferative growth (e.g., ERBB2, EPCAM, and CDH1)45,46. This cancer core signature was derived from a supergroup where all clusters were annotated as cancer-associated. For clarity, the aforementioned cancer supergroup consisted exclusively of cancer-associated clusters, but not all such clusters were members of this group. Complete core signatures are found in Supplementary Data 6.

Inference of cell type organization by integration with single-cell data

Given how the spatial arrangement and patterns of interaction between different cell types have implications for both disease progression and treatment, we wanted to chart each cell type’s spatial distribution within the tissue. However, ST data do not provide single-cell resolution, i.e., each spot hosts several cells of potentially different types, meaning that spots ca not always be exclusively assigned to a single cell type. A common strategy to estimate cell type abundance at the capture locations is to integrate ST and single-cell/nuclei RNA-seq data in order to deconvolve the mixed observations of the former, effectively mapping cell types into spatial context. Such integrative methods have previously been successfully applied to spatial cancer data47. To deconvolve our ST data, we employed the stereoscope method, which for every spot in the spatial data estimates the proportion of cells that belongs to each cell type defined in a given single-cell data set, producing a nspots × ncell types matrix of proportion values48. The stereoscope method uses complete expression profiles, facilitating distinction between similar cell types with overlapping sets of marker genes, which is especially important in a complex and intermixed environment like tumors.

A scRNA-seq data set from five HER2-positive tumors, annotated in three tiers (Methods) was used to deconvolve our spatial data49. The three tiers are referred to as the major, minor, and subset tiers (terminology from Wu et al.). There were eight different cell types in the major class: myeloid cells, T cells, B cells, epithelial cells, plasma cells, endothelial cells, cancer-associated fibroblasts (CAFs), and Perivascular like cells (PVL cells). The minor tier represents finer partitioning of the major cell types, e.g., Mø and CD8+ T cells. In turn, the lowest tier further splits the minor tier into cell states or “subsets”, such as chemokine-expressing Mø and IFNγ-expressing T cells. Excerpts from the integrative analysis are given in Fig. 3. Supplementary Data 7 contains a visualization of all of the remaining patients and tiers, all output from stereoscope is found in Supplementary Data 8.

Fig. 3: Spatial mapping of cell types from scRNA-seq data.

A Enrichment (green) and depletion (red) of major tier cell types in the regions defined by the pathologist (patient G), along with proportion estimates of different cell types (e.g., epithelial, CAFs, plasma cells, and B cells). Spot annotations are indicated by border colors, included regions are: in situ cancer (orange), invasive cancer (red) and immune infiltrate (blue). B Similar to A but with the regions defined by the expression-based clusters. C Correlation plot of all cell types within the major tier across all 36 sections and patients, a distinct correlation between myeloid cells and T cells can be observed. D Proportions of myeloid cells and T cells showing one area with higher (1), respectively, lower (2) degree of colocalization. All presented results are associated with patient G, except for subfigure C where correlation values are computed across all patients. CAFs cancer-associated fibroblasts, PVLs perivascular-like cells.

Interactive exploration of results

We have compiled a resource that contains all data and results from the expression-based clustering and single-cell mapping; with a graphical user interface (GUI) that enables comprehensive exploration of these (Code Availability).

Enrichment of cell types within manually defined regions

We next examined cell type enrichment/depletion within the pathologist-annotated regions. Several affirmative trends were observed, B and T cells were enriched in the immune infiltrates while cancer regions showed enrichment of cancer-related cell types and depletion of several stromal cell types (patient G in Fig. 3A, all patients in Supplementary Data 9). All patients except patient B showed enrichment of the HER2-related epithelial cancer type in invasive cancer regions (Supplementary Figure 9). In contrast, patient B exhibited depletion of the HER2-related epithelial cell type and enrichment of the luminal B (LumB) type in the invasive cancer regions. Coincidentally, patient B was also unique in having a progesterone-receptor positive profile, in concordance, with the LumB molecular subtype (Supplementary Figure 9 and Supplementary Table 2)50. Taken together, these findings were seen as affirmative of our mapping’s validity.

Enrichment of cell types within expression-based clusters

We examined enrichment/depletion of cell types within the expression-based clusters to see how the single-cell mapping related to these (Fig. 3B and Supplementary Data 9). In patient E—with the two spatially disconnected tumor foci—the cluster associated with apoptotic and regulatory pathways (pEc4) was enriched for certain epithelial cells and depleted of memory B cells. In contrast, the immune-rich cancer cluster (pEc3) was enriched for memory B cells and CD4+ T cells, with weak or no enrichment of cancer types (Supplementary Figure 10). In patient G, all clusters annotated as cancer were: depleted of plasma cells, had very low enrichment, or were depleted of B and T cells, and three out of four clusters were enriched for epithelial types (Fig. 3B). The in situ cancer cluster (pGc4) was the only cancer cluster in patient G enriched for dendritic cells across all replicates while also being depleted of myofibroblast-like CAFs (Supplementary Figure 11). We also observed how the plasma cell immune cluster (pGc1) was enriched for plasma cells while the antigen-presenting cell (APC) immune cluster (pGc3) exhibited stronger enrichment of B cells, T cells, and myeloid cells. PVL cells were overrepresented in mixed cancer/connective tissue clusters (pGc6), which also showed enrichment of myeloid cells, CAFs, and endothelial cells (Fig. 3B and Supplementary Figure 12). Agreement with the tissue morphology, pathologist annotations, and expression-based clusters suggest that the single-cell data are sufficiently representative of our tissues to provide a reliable mapping of the included cell types.

Spatial maps of population diversity

To obtain an overview of the cell population’s diversity within different spatial regions, we computed the entropy for the cell type distribution (proportion estimates) within each spot (Methods). A high entropy score indicates high diversity while a low entropy score indicates the opposite. We found that the immune-related clusters were more heterogeneous than the cancer clusters, an effect more pronounced in patient A. The APC-enriched clusters, when present, tended to exhibit the highest degree of diversity, with clusters of patient B being an exception (Supplementary Figure 13).

Trends of cell type colocalization

The cell mapping was used to explore putative cellular interactions by computing the cell type spot-wise Pearson correlation (Fig. 3C), with a positive correlation between two cell types being considered as them colocalizing. At the major tier, the most prominent feature—present in all patients—was an anticorrelation between epithelial cells and every other cell type (Fig. 3C). The endothelial cells (major tier) also exhibited well-preserved colocalization patterns with CAFs (all patients except G) and perivascular cells (all patients except F) (Fig. 3C). At the minor tier, epithelial cells are split into cancer and normal epithelial cells, which anticorrelate in all patients except patient E (Supplementary Figure 14). For patient-wise colocalization matrices, we refer to Supplementary Data 10. The increased cell type resolution thus revealed how the cancer epithelial cells are the main contributors to the trend of epithelial cell anticorrelation observed in the major tier. There is also a spatial separation between the two CAF types at the minor tier (all patients except patient A, Supplementary Figure 14). At the subset level, mature Luminal cells (a subset of normal epithelial cells) always colocalize with one cancer type (except in patient A), which could suggest proximity to luminal cells or a mature luminal phenotype (Supplementary Figure 15). In the immune cell population, plasma cells are anticorrelated with B cells in all patients (Fig. 3B, C) except one (patient A, Supplementary Figure 16). These findings indicate that B cells and plasma cells, reside at distinct locations within the tumor51. It is not clear whether these findings reflect plasma cell migration during local differentiation from tumor-associated B cells, or if the plasma cells have developed from B cells outside the tumor microenvironment. Colocalization of varying strength between B and T cells (major tier) was observed in five of the eight patients; patients G and H exhibited particularly strong colocalization signals and their B and T-cell distributions had an ample overlap as shown below and in Supplementary Figures 17 and18.

We also observed that T cells colocalized with myeloid cells (Fig. 3C, D); of interest, since interactions between T cells and myeloid cells are well established and can profoundly affect their respective behavior52. Recent studies have also revealed a substantial heterogeneity within T-cell and myeloid cell types, where their respective subsets exhibited a diverse spectrum of states13,16,18. When the finer tiers of T cells and myeloid cells were examined, several trends of colocalization could be observed; such as weak positive signals between cDC2:CD1C, Mø1:EGR1, and pDC:IRF7 with several CD4+ T-cell populations, including Tfh and Tregs. We also observed a salient correlation between Mø2:CXCL10 and a T-cell state (T cells:IFIT1) across all patients (Fig. 4A) and wanted to explore this further.

Fig. 4: Colocalization of myeloid cells and T cells.

A Correlation plot of T- and myeloid cell subsets showing a distinct correlation between the T-cell:IFIT1 and Mø2:CXCL10 (macrophage 2:CXCL10) subsets across all patients. B Enrichment (green) and depletion (red) of subsets of T- and myeloid cells in each expression-based cluster, highlighting the presence of the correlated cell types T-cell:IFIT1 subset and Mø2 within cluster 4 of patient G (pGc4). C Proportion estimates for T-cell:IFIT1 subset and Mø2 in pGc4. D Pathways enriched by marker genes for pGc4, type I interferon signaling pathways are highlighted in red. Intersection size is equivalent to the number of overlapping terms between pGc4 marker genes and the given pathway. E Spot-wise enrichment of a type I interferon signaling pathway (GO:0060337) visualized on patient G, the p values which the enrichment scores are based on were computed using a two-sided Fisher’s exact test (Methods), no adjustment for multiple hypothesis testing was applied since a single hypothesis is tested in each spot (enriched or not enriched).

Presence of type I interferon response processes

As indicated in the scRNA-seq resource, Mø2:CXCL10 expressed increased levels of the chemoattractants CXCL9 and CXCL10. Both of these chemokines bind CXCR3, typically found on T cells and NK cells53,54. Tumor-associated myeloid cells expressing CXCL9/10 have previously been described and attributed important immunotherapy-induced antitumor functions16,55,56,57. Furthermore, CXCL9/10 expression may be induced by type I interferon stimuli49. Similarly, several of the marker genes for the T-cell:IFIT1 state are also associated with a type I interferon response.

Type I interferon activation within tumors can act directly on tumor cells, to inhibit proliferation or stimulate apoptotic processes, or indirectly by activation of antitumor immunity58,59. In addition, certain anticancer therapies have been shown to induce and depend on type I interferon activation59. Given the relevance of type I interferon responses in cancer treatment, we wanted to evaluate its association with the Mø2:CXCL10 and T-cell:IFIT1 cell state colocalization signal in our spatial data. Thus, we inspected the cell “type within cluster” enrichment results and noted that a majority of the patients had at least one cluster (e.g., pGc4) enriched for both Mø2:CXCL10 and T cells:IFIT1 (Fig. 4B and Supplementary Data 9). Consequently, we aimed to make a more quantitative estimate of how this joint presence of the Mø and T-cell states was distributed among our clusters. For this purpose, we devised a joint score based on the cell type proportions, where high values indicate high proportion values of both cell types. Stratifying the spots from each sample by their cluster identity, several clusters with elevated signals of the joint score emerged, e.g., pBc3, pDc5, pEc1, and pGc4 (Supplementary Figure 19-Supplementary Figure 20), in agreement with the cluster-cell type enrichment results (Fig. 4B and Supplementary Data 9). Furthermore, interferon response-related pathways were enriched within all clusters with high joint scores (Supplementary Data 4).

Spatial enrichment of type I interferon responses

To chart the interferon pathway expression, we conducted a spatial enrichment analysis. Regions with high enrichment of the type I interferon pathways spatially aligned with areas of joint Mø2:CXCL10 and T-cell:IFIT1 presence (Fig. 4C–E and Supplementary Figure 21). Notably, the relationship between the type I interferon signal and the joint presence of the cell states appears to be asymmetric; joint Mø2:CXCL10 and T-cell:IFIT1 presence overlaps with an elevated interferon signal but the opposite is not always true. Such asymmetry is expected since other cell types are known to exhibit and be stimulated by interferon signaling pathways59. Still, our results imply that joint localization Mø2:CXCL10 and T-cell:IFIT1 subsets often occur in the presence of a type I interferon signal. We confirmed these findings in an independent HER2-positive spatial-gene expression data set, produced using the Visium platform (Supplementary Figure 22). Finally, we showed the presence of the same signal in a data set from a vastly different cancer form (squamous cell carcinoma, SCC) generated with two distinct platforms, ST2K and Visium (Supplementary Figure 23 and Supplementary Note 1)60. Our results suggest that a spatially restricted type I interferon response may relate to Mø-induced recruitment of particular T-cell subsets. Further investigations would be useful to establish whether these interactions are relevant to disease outcomes.

Inferring tertiary lymphoid-like structures from cell type proportions

Next, we returned to the patterns of B- and T-cell colocalization, more specifically how this related to TLSs. Our interest in TLSs stems from their cardinal role in antitumor immune responses and relation to the clinical outcome as well as treatment response. In the context of cancer, TLSs are one of the locations where tumor antigens are presented to T cells, promoting a more-targeted antitumor attack 61.

As TLSs are defined by the presence and interaction of multiple cell types, scRNA-seq techniques are suboptimal for studying them unless the sites are separated from the remaining tissue prior to dissociation. We, therefore, see our use of ST, where each spot represents a small neighborhood populated by multiple cells, as complementary to scRNA-seq methods when studying these structures. Although TLSs are not exclusively inhabited by B and T cells, they are implicated by their joint presence62. Having deconvolved the cell type composition of each spot, we were able to identify spots that exhibited a high degree of colocalization between B and T cells, ergo potentially constituting parts of a TLS-site that we will call TL-like structures. More explicitly we did this in a fashion similar to how the scores for the joint presence of Mø2:CXCL10 and T-cell:IFIT1 subsets were computed, resulting in a metric we refer to as TLS score (Methods). A positive TLS score translates to B and T cells both being enriched for at a site, negative values the opposite. As expected from the overlap in B and T-cell distribution, patient G and H exhibited small compartmentalized regions with high TLS scores (Fig. 5A) here considered as TL-like structures.

Fig. 5: Inference and prediction of tertiary lymphoid-like Structures.

A Proportion estimate of B and T cells together with the computed TLS score (tertiary lymphoid structure) for patients G and H, annotated Hematoxylin and Eosin (HE) images are included for reference. B Rank-plot (coefficient value vs. rank) of the fitted model, genes included in the TLS signature are indicated by red; the zoom-in (*) shows the rank TLS-associated genes. C Top 15 pathways of which the TLS signature showed enrichment of, ranked according to the adjusted pvalue (as provided by g:Profiler). D Predicted TLS score for the 10x GenomicsTM Visium breast cancer data set, using the model trained on patient G and H. Pathologist’s annotation for likely TLS-sites (red) are included as a reference. Scale bars represent 1000 μm.

Experimental validation of TL-like structures

To validate that our estimates of joint B- and T-cell presence were suitable as proxies for TL-like structures, we used IHC. We stained against the canonical cell type markers CD20 (B cells) and CD3 (T cells) on three additional sections from patient H, one double staining and two single stains (Methods). These sections were not adjacent to the sections subjected to ST, but as proximal as technically feasible in order to preserve as much as possible of the major histological regions. Our results showed characteristic features of previously described immature TLSs—compartmentalized structures of B cells surrounded by T cells, but lacking distinct germinal centers—in regions of elevated TLS signals (Supplementary Figure 24) 61,63.

Characterizing the gene expression profiles of TL-like structures

Next, we wanted to assess how the gene expression related to the TLS score. For this purpose, we used a simple linear model to predict the TLS score of a spot, based on its (normalized) gene expression, and then extracted the genes with the largest coefficients, i.e., highest contribution to a positive score; we refer to this set of genes as a TLS signature. The number of signature members (171 genes, Fig. 5B, Supplementary Data 5) was determined by a threshold derived from the trained model (Methods). The three genes with the largest coefficient values were: MS4A1 (a well-known B-cell associated gene, encoding the antigen CD20), B2M (encoding a protein that interacts with and stabilizes MHC I), and TRBC2 (encoding a component of the T-cell receptor). Other signature members have previously been associated with TLSs e.g., CXCL13, CXCR5, CCL19, and LTB, the latter two which suggest the presence of other TLS-associated cell types in addition to B and T cells (Fig. 5B)61. To see what biological processes the TLS signature was enriched for, we subjected it to functional enrichment analysis (using g:Profiler, querying against GO:BP). The top processes were all related to cell activation, differentiation, and immune response or regulation (Fig. 5C and Supplementary Data 12).

Model validation by cross-platform prediction and prediction of survival outcome

To control for overfitting, we applied the model to (HER2-positive) breast cancer data originating from a different platform (Visium, downloaded from 10x GenomicsTM website, see Data Availability). Strong concentrated signals were observed in the Visium data, overlapping with the region identified as a likely TLS-site by our pathologist (Fig. 5D). As an additional test, we applied our model to ST data from other tissue types: developmental heart, rheumatoid arthritis, and melanoma64,65,66. No TL-like structures were identified in the former whereas several TL-like structures were found in the latter two, as expected (Supplementary Figure 25 and Supplementary Note 2).

Although no clinical data were available for our patients, we reasoned that if the model we propose captures TLS-related features, it should be able to reproduce results from previous studies where such features are examined. We, therefore, applied our model to the same data set (TCGA skin cutaneous melanoma (SKCM) bulk RNA-seq) as Cabrita et al67. used to show how a TLS signal’s strength can be associated with overall survival. Encouragingly, we managed to successfully reproduce their result, linking patients of high predicted TLS score to better overall survival than patients with low or intermediate scores (Supplementary Figure 26 and Supplementary Note 3). Thus, our model not only generalized to other experimental platforms and external data but could also reproduce TLS-linked patient outcome results from bulk RNA-seq studies. These results strongly support our model’s ability to identify TL-like structures.


We have used the ST technique to study the spatial-gene expression profiles of eight HER2-positive tumors. The work can be decomposed into two main parts.

First, our study followed an outline similar to many single-cell studies; we normalized and clustered the data, extracted marker genes for each cluster by differential gene expression analysis, subjected the marker genes to functional enrichment analysis, and finally used the entire corpus of information to annotate the clusters. From a joint analysis of the clusters, we derived representative core signatures of the shared tumor and immune populations in our data set. The expression-based clusters agreed with the high-level pathologist annotations, but also gave a finer partition of the tissues and information about each region’s molecular profile. The value of this is manifold; manual annotation is labor-intensive, requires access to a trained pathologist, and does not—like ST data—enable molecular profiling of the regions (e.g., pathways enrichment).

In the second part of our study, we used an integrative method (stereoscope) to spatially map cell types/states defined in a single-cell data set; which allowed us to survey the spatial distribution of the cell types and their patterns of colocalization. From the colocalization trends, we concluded that: the epithelial cancer types tend to be dominant when present; B cells seem to spatially segregate from plasma cells; there was a type I interferon-associated coupling between the certain T-cell and Mø states in some patients, and B and T cells colocalize in several patients. We particularly focused on two of these interactions, between T-cell/Mø subsets and B cells/T cells.

The proximal location between the CXCL9/10-expressing Mø and T-cell states, observed in several samples, suggests that Mø2:CXCL10 could be recruiting the IFIT1 expressing T cells into specific locations. We reproduced our findings—an interaction between Mø and T-cell subsets in the presence of a type I interferon response—in two external data sets: a breast cancer (HER2-positive) tissue section and a larger data set from a different cancer type (SCC), both data sets were profiled with different experimental platforms. Colocalization between Mø and T cells had already been shown in the SCC data set using antibody-based methods as well as expression of the receptor CXCR3 and its ligands CXCL9/10/11; the authors also report a type I interferon response being present in their data, but do not explicitly link the two signals together60. Although other publications have documented related findings, more extensive efforts are required to understand the mechanism of the trifold interaction and its role in the tumor ecosystem 13,14.

The other colocalization signal, between B and T cells, was used as a proxy for sites resembling TLSs (here, TL-like structures), an approach validated by IHC staining. The staining showed the expected B and T-cell compartmentalization, but could not confirm the presence of mature germinal centers, implying immature TLSs or variable tissue quality. We used a linear regression model to identify TL-like structures based on gene expression, and extracted a gene signature from the model’s parameters. Several signature genes had previously been attributed important roles in TLS formation or function, and were not B- and T-cell specific, acting as further evidence of the method’s legitimacy. Despite its simplicity, the model generalized across techniques and tissues, it also reproduced previous results linking TLS prevalence to clinical outcome. Although further studies are necessary to confirm our findings, transcriptional profiling of TL-like structures in this manner could potentially reveal therapeutic targets for drugs aiming to promote anticancer immune responses.

We chose the stereoscope method to map single-cell data since it does not rely on marker genes, rather it operates with expression profiles, which is beneficial when attempting to resolve cell states defined by few and not mutually exclusive marker genes. The mapped single-cell data are considered representative of our tissues but have some shortcomings. Importantly, cell types such as neutrophils, mast cells, and adipocytes were not present in the single-cell data. Other cell types (e.g., certain dendritic cell subsets) were too few in numbers to enable robust mapping and hence omitted from the mapping. Still, our ST data and previous studies suggest the presence of these cell types, which have been shown to have important tumor-associated functions, in the tumor tissues2,68. Absence of cell types could also lead to slight misrepresentations of the cell type distributions. Furthermore, there is a risk of transcriptionally dominant cell types masking less abundant ones, as a majority of the captured transcripts would originate from the former. Finally, we believe that cell subsets defined by a proliferative state could incorrectly be mapped to the same region—i.e., presenting a false colocalization signal—if the proliferation process strongly dominates the expression profile of these cells, akin to observation in previous studies69. However, these concerns pertain to all methods relying on a reference data set, and are not unique to stereoscope. Taken together, we regard the integrative and expression-based clustering strategies as complementary methods that together establish a holistic image of our data.

Our study shows how ST data can be used to chart the molecular profiles of tissue samples in a disease context. It also hints at the potential impact future similar studies might have if coupled with clinical data. For example, cell colocalization patterns may be linked to patient outcome, used to assess drug responses in a spatially restricted manner within tumors, and to study functional interactions. In addition, the development of computational methods that enable the assessment of complex, non-linear, and hierarchical interactions between multiple cell types would benefit any prospective study.

To conclude, we demonstrate a transferable workflow for the comprehensive analysis of ST data applied to HER2-positive breast cancer tumors. Several findings emerged from our analysis that, if further explored, may help to better understand the underlying disease mechanisms and open up for new vantage points for therapy.

Article classification: Biological abstract
Share to:
Add:Room A11-329, 1st Floor, No.1, SBI Venture Street, Optics Valley, East Lake
New Technology Development Zone, Wuhan, China.
Certificate NO.:U18Q28010569R0S