Module contents¶

class ssam.SSAMAnalysis(dataset, ncores=-1, save_dir='', verbose=False)[source]¶

Bases: object

A class to run SSAM analysis.

Parameters:

dataset (SSAMDataset) – A SSAMDataset object.
ncores (int) – Number of cores for parallel computation. If a negative value is given, ((# of all available cores on system) - abs(ncores)) cores will be used.
save_dir (str) – Directory to store intermediate data (e.g. density / vector field). Any data which already exists will be loaded and reused.
verbose (bool) – If True, then it prints out messages during the analysis.

bin_celltypemaps(step=10, radius=100)[source]¶

Sweep a sphere window along a lattice on the image, and count the number of cell types in each window.

Parameters:	step (int) – The lattice spacing. radius (int) – The radius of the sphere window.

calc_cell_type_compositions()[source]¶: Calculate cell type compositions in each domain.

calc_correlation_map(corr_size=3)[source]¶

Calculate local correlation map of the vector field.

Parameters:	corr_size (int) – Size of square (or cube) that is used to compute the local correlation values. This value should be an odd number.

calc_spatial_relationship()[source]¶: Calculate spatial relationship between the domains using the result of bin_celltypemap.

cluster_vectors(pca_dims=10, min_cluster_size=0, resolution=0.6, prune=0.06666666666666667, snn_neighbors=30, max_correlation=1.0, metric='correlation', subclustering=False, dbscan_eps=0.4, centroid_correction_threshold=0.8, random_state=0)[source]¶

Cluster the given vectors using the specified clustering method.

Parameters:

pca_dims (int) – Number of principal componants used for clustering.
min_cluster_size (int) – Set minimum cluster size.
resolution (float) – Resolution for Louvain community detection.
prune (float) – Threshold for Jaccard index (weight of SNN network). If it is smaller than prune, it is set to zero.
snn_neighbors (int) – Number of neighbors for SNN network.
max_correlation (bool) – Clusters with higher correlation to this value will be merged.
metric (str) – Metric for calculation of distance between vectors in gene expression space.
subclustering (bool) – If True, each cluster will be clustered once again with DBSCAN algorithm to find more subclusters.
centroid_correction_threshold (float) – Centroid will be recalculated with the vectors which have the correlation to the cluster medoid equal or higher than this value.
random_state (int or random state object) – Random seed or scikit-learn’s random state object to replicate the same result

exclude_and_merge_clusters(exclude=[], merge=[], centroid_correction_threshold=0.8)[source]¶

Exclude bad clusters (including the vectors in the clusters), and merge similar clusters for the downstream analysis.

Parameters:	exclude (list(int)) – List of cluster indices to be excluded. merge (list(list(int))) – List of list of cluster indices to be merged. centroid_correction_threshold (float) – Centroid will be recalculated with the vectors which have the correlation to the cluster medoid equal or higher than this value.

exclude_and_merge_domains(exclude=[], merge=[])[source]¶

Manually exclude or merge domains.

Parameters:	exclude (list(int)) – Indices of the domains which will be excluded. merge (list(list(int))) – List of indices of the domains which will be merged.

expand_localmax(r=0.99, min_pixels=7, max_pixels=1000)[source]¶

Merge the vectors nearby the local max vectors. Only the vectors with the large Pearson correlation values are merged.

Parameters:	r (float) – Minimum Pearson’s correlation coefficient to look for the nearby vectors. min_pixels (float) – Minimum number of pixels to merge. max_pixels (float) – Maximum number of pixels to merge.

filter_celltypemaps(min_r=0.6, min_norm=0.1, fill_blobs=True, min_blob_area=0, filter_params={}, output_mask=None)[source]¶

Post-filter cell type maps created by map_celltypes.

Parameters:

min_r (float) – minimum threshold of the correlation.
min_norm (str or float) – minimum threshold of the vector norm. If a string is given instead, then the threshold is automatically determined using sklearn’s threshold filter functions (The functions start with threshold_).
fill_blobs (bool) – If true, then the algorithm automatically fill holes in each blob.
min_blob_area (int) – The blobs with its area less than this value will be removed.
filter_params (dict) – Filter parameters used for the sklearn’s threshold filter functions. Not used when min_norm is float.
output_mask (np.ndarray(bool)) – If given, the cell type maps will be filtered using the output mask.

find_domains(centroid_indices=[], n_clusters=10, norm_thres=0, merge_thres=0.6, merge_remote=True)[source]¶

Find domains in the image, using the result of bin_celltypemaps.

Parameters:

centroid_indices (list(int)) – The indices of centroids which will be used for determine tissue domains.
n_clusters (int) – Initial number of clusters (domains) of agglomerative clustering.
norm_thres (int) – Threshold for the total number of cell types in each window. The window which contains the number of cell-type pixels less than this value will be ignored.
merge_thres (float) – Threshold for merging domains. The centroids of the domains which have higher correlation to this value will be merged.
merge_remote (bool) – If true, allow merging clusters that are not adjacent to each other.

find_localmax(search_size=3, min_norm=0, min_expression=0, mask=None)[source]¶

Find local maxima vectors in the norm of the vector field.

Parameters:

search_size (int) – Size of square (or cube in 3D) that is used to search for the local maxima. This value should be an odd number.
min_norm (float) – Minimum value of norm at the local maxima.
min_expression (float) – Minimum value of gene expression in a unit pixel at the local maxima. mask: numpy.ndarray, optional If given, find vectors in the masked region, instead of the whole image.

map_celltypes(centroids=None)[source]¶

Create correlation maps between the centroids and the vector field. Each correlation map corresponds each cell type map.

Parameters:	centroids (list(np.array(int))) – If given, map celltypes with the given cluster centroids.

normalize_vectors(use_expanded_vectors=False, normalize_gene=False, normalize_vector=False, normalize_median=False, size_after_normalization=10000.0, log_transform=False, scale=False)[source]¶

Normalize and regularize vectors

Parameters:

use_expanded_vectors (bool) – If True, use averaged vectors nearby local maxima of the vector field.
normalize_gene (bool) – If True, normalize vectors by sum of each gene expression across all vectors.
normalize_vector (bool) – If True, normalize vectors by sum of all gene expression of each vector.
log_transform (bool) – If True, vectors are log transformed.
scale (bool) – If True, vectors are z-scaled (mean centered and scaled by stdev).

normalize_vectors_sctransform(use_expanded_vectors=False, normalize_vf=True, vst_kwargs={})[source]¶

Normalize and regularize vectors using SCtransform

Parameters:	use_expanded_vectors (bool) – If True, use averaged vectors nearby local maxima of the vector field. normalize_vf (bool) – If True, the vector field is also normalized using the same parameters used to normalize the local maxima. vst_kwargs (dict) – Optional keywords arguments for sctransform’s vst function.

rescue_cluster(gene_names, expression_thresholds=[])[source]¶

run_fast_kde(kernel='gaussian', bandwidth=2.5, sampling_distance=1.0, re_run=False, use_mmap=False)[source]¶

Run KDE faster than run_kde method. This method uses precomputed kernels to estimate density of mRNA.

Parameters:

kernel (str) – Kernel for density estimation. Currently only Gaussian kernel is supported.
bandwidth (float) – Parameter to adjust width of kernel. Set it 2.5 to make FWTM of Gaussian kernel to be ~10um (assume that avg. cell diameter is ~10um).
sampling_distance (float) – Grid spacing in um. Currently only 1 um is supported.
re_run (bool) – Recomputes KDE, ignoring all existing precomputed densities in the data directory.
use_mmap (bool) – Use MMAP to reduce memory usage during analysis. Currently not implemented, this option should be always disabled.

run_kde(kernel='gaussian', bandwidth=2.5, sampling_distance=1.0, use_mmap=False)[source]¶

Run KDE to estimate density of mRNA.

Parameters:

kernel (str) – Kernel for density estimation.
bandwidth (float) – Parameter to adjust width of kernel. Set it 2.5 to make FWTM of Gaussian kernel to be ~10um (assume that avg. cell diameter is ~10um).
sampling_distance (float) – Grid spacing in um.
use_mmap (bool) – Use MMAP to reduce memory usage during analysis. Turning on this option can reduce the amount of memory used by SSAM analysis, but also lower the analysis speed.

class ssam.SSAMDataset(genes, locations, width, height, depth=1)[source]¶

Bases: object

A class to store intial values and results of SSAM analysis.

Parameters:

genes (list(str)) – The genes that will be used for the analysis.
locations (list(numpy.ndarray)) – Location of the mRNAs in um, given as a list of N x D ndarrays (N is number of mRNAs, D is number of dimensions).
width (float) – Width of the image in um.
height (float) – Height of the image in um.
depth (float) – Depth of the image in um. Depth == 1 means 2D image.

get_celltype_correlation(idx)[source]¶

Get correlation values of a cell type map between the given cluster’s centroid to the vector field.

Parameters:	idx (int) – Index of a cluster
Returns:	Correlation values of a cell type map of the specified cluster’s centroid
Return type:	numpy.ndarray

plot_celltype_composition(domain_index, cell_type_colors=None, cell_type_cmap='jet', cell_type_orders=None, label_cutoff=0.03, pctdistance=1.15, **kwargs)[source]¶

Plot composition of cell types in each domain.

Parameters:

domain_index (int) – Index of the domain.
cell_type_colors (str or list(float)) – The colors of the cell types. Overrides cell_type_cmap parameter.
cell_type_cmap (str or matplotlib.colors.Colormap) – The colormap for the cell types.
label_cutoff (float) – The minimum cutoff of the labeling of the percentage. From 0 to 1.
pctdistance (float) – The distance from center of the pie to the labels.
kwargs – More kewward arguments for the matplotlib.pyplot.pie.

plot_celltypes_map(background='black', centroid_indices=[], colors=None, cmap='jet', rotate=0, min_r=0.6, set_alpha=False, z=None)[source]¶

Plot the merged cell-type map.

Parameters:

background (str or list(float)) – Set background color of the cell-type map.
centroid_indices (list(int)) – The centroids which will be in the cell type map. If not given, the cell-type map is drawn with all centroids.
colors (list(str), list(list(float))) – Color of the clusters. Overrides cmap parameter.
cmap (str or matplotlib.colors.Colormap) – Colormap for the clusters.
rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3.
min_r (float) – Minimum correlation threshold for the cell-type map. This value is only for the plotting, does not affect to the cell-type maps generated by filter_celltypemaps.
set_alpha (bool) – Set alpha of each pixel based on the correlation. Not properly implemented yet, doesn’t work properly with the background other than black.
z (int) – Z index to slice 3D cell-type map. If not given, the slice at the middle will be used.

plot_correlation_map(cmap='hot')[source]¶

Plot the correlations near the vectors in the vector field (Not fully implemented yet).

Parameters:	cmap – Colormap for the image.

plot_diagnostic_plot(centroid_index, cluster_name=None, cluster_color=None, cmap=None, rotate=0, z=None, use_embedding='tsne', known_signatures=[], correlation_methods=[])[source]¶

Plot the diagnostic plot. This method requires plot_tsne or plot_umap was run at least once before.

Parameters:

centroid_index (int) – Index of the centroid for the diagnostic plot.
cluster_name (str) – The name of the cluster.
cluster_color (str or list(float)) – The color of the cluster. Overrides cmap parameter.
cmap (str or matplotlib.colors.Colormap) – The colormap for the clusters. The cluster color is determined using the centroid_index th color of the given colormap.
rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3.
z (int) – Z index to slice 3D vector norm and cell-type map plots. If not given, the slice at the middle will be used.
use_embedding (str) – The type of the embedding for the last panel. Possible values are “tsne” or “umap”.
known_signatures (list(tuple)) – The list of known signatures, which will be displayed in the 3rd panel. Each signature can be 3-tuple or 4-tuple, containing 1) the name of signature, 2) gene labels of the signature, 3) gene expression values of the signature, 4) optionally the color of the signature.
correlation_methods (list(tuple)) – The correlation method used to determine max correlation of the centroid to the known_signatures. Each method should be 2-tuple, containing 1) the name of the correaltion, 2) the correaltion function (compatiable with the correlation methods available in scipy.stats)

plot_domains(background='white', colors=None, cmap='jet', rotate=0, domain_background=False, background_alpha=0.3, z=None)[source]¶

Plot tissue domains.

Parameters:

background (str or list(float)) – Background color of the plot.
colors (list(str), list(list(float))) – Color of the domains. Overrides cmap parameter.
cmap (str or matplotlib.colors.Colormap) – Colormap for the domains.
rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3.
domain_background (bool) – Show the area of the inferred domains behind the domain map.
background_alpha (float) – The alpha value of the area of the inferred domains.
z (int) – Z index to slice 3D domain map. If not given, the slice at the middle will be used.

plot_expanded_mask(cmap='Greys')[source]¶

Plot the expanded area of the vectors (Not fully implemented yet).

Parameters:	cmap – Colormap for the mask.

plot_l1norm(cmap='viridis', rotate=0, z=None)[source]¶

Plot the L1-norm of the vector field.

Parameters:	cmap (str or matplotlib.colors.Colormap) – Colormap used for the plot. rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3. z (int) – Z index to slice 3D vector field. If not given, the slice at the middle will be plotted.

plot_localmax(c=None, cmap=None, s=1, rotate=0)[source]¶

Scatter plot the local maxima.

Parameters:	c (str or list(str), or list(float) or list(list(float))) – Color of the scatter dots. Overrides cmap parameter. cmap (str or matplotlib.colors.Colormap) – Colormap of the scatter dots. s – Size of the scatter dots. rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3.

plot_spatial_relationships(cluster_labels, *args, **kwargs)[source]¶

Plot spatial relationship between cell types, presented as a heatmap.

Parameters:	cluster_labels (list(str)) – x- and y-axis label of the heatmap. args – More arguments for the seaborn.heatmap. kwargs – More keyword arguments for the seaborn.heatmap.

plot_tsne(run_tsne=False, pca_dims=10, n_iter=5000, perplexity=70, early_exaggeration=10, metric='correlation', exclude_bad_clusters=True, s=None, random_state=0, colors=[], excluded_color='#00000033', cmap='jet', tsne_kwargs={})[source]¶

Scatter plot the tSNE embedding.

Parameters:

run_tsne (bool) – If false, this method tries to load precomputed tSNE result before running tSNE.
pca_dims (int) – Number of PCA dimensions used for the tSNE embedding.
n_iter (int) – Maximum number of iterations for the tSNE.
perplexity (float) – The perplexity value of the tSNE (please refer to the section How should I set the perplexity in t-SNE? in this link).
early_exaggeration (float) – Early exaggeration parameter for tSNE. Controls the tightness of the resulting tSNE plot.
metric (str) – Metric for calculation of distance between vectors in gene expression space.
exclude_bad_clusters (bool) – If true, the vectors that are excluded by the clustering algorithm will not be considered for tSNE computation.
s (float) – Size of the scatter dots.
random_state (int or random state object) – Random seed or scikit-learn’s random state object to replicate the same result
colors (list(str), list(list(float))) – Color of each clusters.
excluded_color (str of list(float)) – Color of the vectors excluded by the clustering algorithm.
cmap (str or matplotlib.colors.Colormap) – Colormap for the clusters.
tsne_kwargs (dict) – Other keyward parameters for tSNE.

plot_umap(run_umap=False, pca_dims=10, metric='correlation', exclude_bad_clusters=True, s=None, random_state=0, colors=[], excluded_color='#00000033', cmap='jet', umap_kwargs={})[source]¶

Scatter plot the UMAP embedding.

Parameters:

run_umap – If false, this method tries to load precomputed UMAP result before running UMAP.
pca_dims (int) – Number of PCA dimensions used for the UMAP embedding.
metric (str) – Metric for calculation of distance between vectors in gene expression space.
exclude_bad_clusters (bool) – If true, the vectors that are excluded by the clustering algorithm will not be considered for tSNE computation.
s (float) – Size of the scatter dots.
random_state (int or random state object) – Random seed or scikit-learn’s random state object to replicate the same result
colors (list(str), list(list(float))) – Color of each clusters.
excluded_color (str of list(float)) – Color of the vectors excluded by the clustering algorithm.
cmap (str or matplotlib.colors.Colormap) – Colormap for the clusters.
umap_kwargs (dict) – Other keyward parameters for UMAP.

vf¶: Vector field as a numpy.ndarray.

vf_norm¶: L1-norm of the vector field as a numpy.ndarray.

ssam.run_sctransform(data, clip_range=None, verbose=True, debug_path=None, plot_model_pars=False, **kwargs)[source]¶

Run ‘sctransform’ R package and returns the normalized matrix and the model parameters. Package ‘feather’ is used for the data exchange between R and Python. :param data: N x D ndarray to normlize (N is number of samples, D is number of dimensions). :type data: numpy.ndarray :param kwargs: Any keyword arguments passed to R function vst. :returns: A 2-tuple, which contains two pandas.dataframe:

normalized N x D matrix.

determined model parameters.