Segmenting the SSAM cell type map

While we demonstrate the accuracy of SSAM in reconstructing celltype maps, we understand that many applications in biology require cell segmentation. As such, the development branch of SSAM supports segmentation of the celltype map using the watershed algorithm.

The run_watershed method takes the DAPI image as input and segments the celltype map using the watershed algorithm. The DAPI image is used as markers for the watershed segmentation. The segmentations and the segmented celltype map are stored in the watershed_segmentations and watershed_celltype_map attributes of the Dataset object.

The segmentation of the cell type map can be performed by:

# Load DAPI image
with open('zenodo/osmFISH/raw_data/im_nuc_small.pickle', 'rb') as f:
    dapi = pickle.load(f)
dapi_small = np.hstack([dapi.T[:1640], np.zeros([1640, 12])]).reshape(ds.vf_norm.shape)

# Threshold DAPI image to create markers
dapi_threshold = filters.threshold_local(dapi_small[..., 0], 35, offset=-0.0002)
dapi_thresh_im = dapi_small[..., 0] > dapi_threshold
dapi_thresh_im = dapi_thresh_im.reshape(ds.vf_norm.shape).astype(np.uint8) * 255

# Run watershed segmentation of cell-type maps with DAPI as markers
analysis.run_watershed(dapi_thresh_im, df) # df is the dataframe containing the spot locations

Below we demonstrate the application of the segmentation on the de novo celltype map generated for the mouse SSp osmFISH data.

After running the watershed segmentation, the cell by gene matrix and the center of mass of the segmented celltype map can be used to generate a cell by gene matrix for the segmented celltype map. This can be used for further downstream analysis such as cell-cell communication analysis.

The cell by gene matrix can be accessed by cell_by_gene_matrix and the center of mass of the segments can be accessed by center_of_masses attributes of the SSAMDataset object. Below we demonstrate a simple reanalysis of the segmented celltype map using Scanpy.

import scanpy as sc
adata = sc.AnnData(ds.cell_by_gene_matrix)

sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.scale(adata)
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)

sc.tl.umap(adata)
sc.tl.leiden(adata)
sc.pl.umap(adata, color='leiden')

plt.figure(figsize=(5, 5))
plt.scatter(ds.center_of_masses[:, 0], ds.center_of_masses[:, 1], c=[int(i) for i in adata.obs['leiden']], cmap='tab20', s=5)