Module contents¶
-
class
ssam.
SSAMAnalysis
(dataset, ncores=-1, save_dir='', verbose=False)[source]¶ Bases:
object
A class to run SSAM analysis.
Parameters: - dataset (SSAMDataset) – A SSAMDataset object.
- ncores (int) – Number of cores for parallel computation. If a negative value is given, ((# of all available cores on system) - abs(ncores)) cores will be used.
- save_dir (str) – Directory to store intermediate data (e.g. density / vector field). Any data which already exists will be loaded and reused.
- verbose (bool) – If True, then it prints out messages during the analysis.
-
bin_celltypemaps
(step=10, radius=100)[source]¶ Sweep a sphere window along a lattice on the image, and count the number of cell types in each window.
Parameters:
-
calc_correlation_map
(corr_size=3)[source]¶ Calculate local correlation map of the vector field.
Parameters: corr_size (int) – Size of square (or cube) that is used to compute the local correlation values. This value should be an odd number.
-
calc_spatial_relationship
()[source]¶ Calculate spatial relationship between the domains using the result of bin_celltypemap.
-
cluster_vectors
(pca_dims=10, min_cluster_size=0, resolution=0.6, prune=0.06666666666666667, snn_neighbors=30, max_correlation=1.0, metric='correlation', subclustering=False, dbscan_eps=0.4, centroid_correction_threshold=0.8, random_state=0)[source]¶ Cluster the given vectors using the specified clustering method.
Parameters: - pca_dims (int) – Number of principal componants used for clustering.
- min_cluster_size (int) – Set minimum cluster size.
- resolution (float) – Resolution for Louvain community detection.
- prune (float) – Threshold for Jaccard index (weight of SNN network). If it is smaller than prune, it is set to zero.
- snn_neighbors (int) – Number of neighbors for SNN network.
- max_correlation (bool) – Clusters with higher correlation to this value will be merged.
- metric (str) – Metric for calculation of distance between vectors in gene expression space.
- subclustering (bool) – If True, each cluster will be clustered once again with DBSCAN algorithm to find more subclusters.
- centroid_correction_threshold (float) – Centroid will be recalculated with the vectors which have the correlation to the cluster medoid equal or higher than this value.
- random_state (int or random state object) – Random seed or scikit-learn’s random state object to replicate the same result
-
exclude_and_merge_clusters
(exclude=[], merge=[], centroid_correction_threshold=0.8)[source]¶ Exclude bad clusters (including the vectors in the clusters), and merge similar clusters for the downstream analysis.
Parameters: - exclude (list(int)) – List of cluster indices to be excluded.
- merge (list(list(int))) – List of list of cluster indices to be merged.
- centroid_correction_threshold (float) – Centroid will be recalculated with the vectors which have the correlation to the cluster medoid equal or higher than this value.
-
exclude_and_merge_domains
(exclude=[], merge=[])[source]¶ Manually exclude or merge domains.
Parameters:
-
expand_localmax
(r=0.99, min_pixels=7, max_pixels=1000)[source]¶ Merge the vectors nearby the local max vectors. Only the vectors with the large Pearson correlation values are merged.
Parameters:
-
filter_celltypemaps
(min_r=0.6, min_norm=0.1, fill_blobs=True, min_blob_area=0, filter_params={}, output_mask=None)[source]¶ Post-filter cell type maps created by map_celltypes.
Parameters: - min_r (float) – minimum threshold of the correlation.
- min_norm (str or float) – minimum threshold of the vector norm. If a string is given instead, then the threshold is automatically determined using sklearn’s threshold filter functions (The functions start with threshold_).
- fill_blobs (bool) – If true, then the algorithm automatically fill holes in each blob.
- min_blob_area (int) – The blobs with its area less than this value will be removed.
- filter_params (dict) – Filter parameters used for the sklearn’s threshold filter functions. Not used when min_norm is float.
- output_mask (np.ndarray(bool)) – If given, the cell type maps will be filtered using the output mask.
-
find_domains
(centroid_indices=[], n_clusters=10, norm_thres=0, merge_thres=0.6, merge_remote=True)[source]¶ Find domains in the image, using the result of bin_celltypemaps.
Parameters: - centroid_indices (list(int)) – The indices of centroids which will be used for determine tissue domains.
- n_clusters (int) – Initial number of clusters (domains) of agglomerative clustering.
- norm_thres (int) – Threshold for the total number of cell types in each window. The window which contains the number of cell-type pixels less than this value will be ignored.
- merge_thres (float) – Threshold for merging domains. The centroids of the domains which have higher correlation to this value will be merged.
- merge_remote (bool) – If true, allow merging clusters that are not adjacent to each other.
-
find_localmax
(search_size=3, min_norm=0, min_expression=0, mask=None)[source]¶ Find local maxima vectors in the norm of the vector field.
Parameters: - search_size (int) – Size of square (or cube in 3D) that is used to search for the local maxima. This value should be an odd number.
- min_norm (float) – Minimum value of norm at the local maxima.
- min_expression (float) – Minimum value of gene expression in a unit pixel at the local maxima. mask: numpy.ndarray, optional If given, find vectors in the masked region, instead of the whole image.
-
map_celltypes
(centroids=None)[source]¶ Create correlation maps between the centroids and the vector field. Each correlation map corresponds each cell type map.
Parameters: centroids (list(np.array(int))) – If given, map celltypes with the given cluster centroids.
-
normalize_vectors
(use_expanded_vectors=False, normalize_gene=False, normalize_vector=False, normalize_median=False, size_after_normalization=10000.0, log_transform=False, scale=False)[source]¶ Normalize and regularize vectors
Parameters: - use_expanded_vectors (bool) – If True, use averaged vectors nearby local maxima of the vector field.
- normalize_gene (bool) – If True, normalize vectors by sum of each gene expression across all vectors.
- normalize_vector (bool) – If True, normalize vectors by sum of all gene expression of each vector.
- log_transform (bool) – If True, vectors are log transformed.
- scale (bool) – If True, vectors are z-scaled (mean centered and scaled by stdev).
-
normalize_vectors_sctransform
(use_expanded_vectors=False, normalize_vf=True, vst_kwargs={})[source]¶ Normalize and regularize vectors using SCtransform
Parameters: - use_expanded_vectors (bool) – If True, use averaged vectors nearby local maxima of the vector field.
- normalize_vf (bool) – If True, the vector field is also normalized using the same parameters used to normalize the local maxima.
- vst_kwargs (dict) – Optional keywords arguments for sctransform’s vst function.
-
run_fast_kde
(kernel='gaussian', bandwidth=2.5, sampling_distance=1.0, re_run=False, use_mmap=False)[source]¶ Run KDE faster than run_kde method. This method uses precomputed kernels to estimate density of mRNA.
Parameters: - kernel (str) – Kernel for density estimation. Currently only Gaussian kernel is supported.
- bandwidth (float) – Parameter to adjust width of kernel. Set it 2.5 to make FWTM of Gaussian kernel to be ~10um (assume that avg. cell diameter is ~10um).
- sampling_distance (float) – Grid spacing in um. Currently only 1 um is supported.
- re_run (bool) – Recomputes KDE, ignoring all existing precomputed densities in the data directory.
- use_mmap (bool) – Use MMAP to reduce memory usage during analysis. Currently not implemented, this option should be always disabled.
-
run_kde
(kernel='gaussian', bandwidth=2.5, sampling_distance=1.0, use_mmap=False)[source]¶ Run KDE to estimate density of mRNA.
Parameters: - kernel (str) – Kernel for density estimation.
- bandwidth (float) – Parameter to adjust width of kernel. Set it 2.5 to make FWTM of Gaussian kernel to be ~10um (assume that avg. cell diameter is ~10um).
- sampling_distance (float) – Grid spacing in um.
- use_mmap (bool) – Use MMAP to reduce memory usage during analysis. Turning on this option can reduce the amount of memory used by SSAM analysis, but also lower the analysis speed.
-
class
ssam.
SSAMDataset
(genes, locations, width, height, depth=1)[source]¶ Bases:
object
A class to store intial values and results of SSAM analysis.
Parameters: - genes (list(str)) – The genes that will be used for the analysis.
- locations (list(numpy.ndarray)) – Location of the mRNAs in um, given as a list of N x D ndarrays (N is number of mRNAs, D is number of dimensions).
- width (float) – Width of the image in um.
- height (float) – Height of the image in um.
- depth (float) – Depth of the image in um. Depth == 1 means 2D image.
-
get_celltype_correlation
(idx)[source]¶ Get correlation values of a cell type map between the given cluster’s centroid to the vector field.
Parameters: idx (int) – Index of a cluster Returns: Correlation values of a cell type map of the specified cluster’s centroid Return type: numpy.ndarray
-
plot_celltype_composition
(domain_index, cell_type_colors=None, cell_type_cmap='jet', cell_type_orders=None, label_cutoff=0.03, pctdistance=1.15, **kwargs)[source]¶ Plot composition of cell types in each domain.
Parameters: - domain_index (int) – Index of the domain.
- cell_type_colors (str or list(float)) – The colors of the cell types. Overrides cell_type_cmap parameter.
- cell_type_cmap (str or matplotlib.colors.Colormap) – The colormap for the cell types.
- label_cutoff (float) – The minimum cutoff of the labeling of the percentage. From 0 to 1.
- pctdistance (float) – The distance from center of the pie to the labels.
- kwargs – More kewward arguments for the matplotlib.pyplot.pie.
-
plot_celltypes_map
(background='black', centroid_indices=[], colors=None, cmap='jet', rotate=0, min_r=0.6, set_alpha=False, z=None)[source]¶ Plot the merged cell-type map.
Parameters: - background (str or list(float)) – Set background color of the cell-type map.
- centroid_indices (list(int)) – The centroids which will be in the cell type map. If not given, the cell-type map is drawn with all centroids.
- colors (list(str), list(list(float))) – Color of the clusters. Overrides cmap parameter.
- cmap (str or matplotlib.colors.Colormap) – Colormap for the clusters.
- rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3.
- min_r (float) – Minimum correlation threshold for the cell-type map. This value is only for the plotting, does not affect to the cell-type maps generated by filter_celltypemaps.
- set_alpha (bool) – Set alpha of each pixel based on the correlation. Not properly implemented yet, doesn’t work properly with the background other than black.
- z (int) – Z index to slice 3D cell-type map. If not given, the slice at the middle will be used.
-
plot_correlation_map
(cmap='hot')[source]¶ Plot the correlations near the vectors in the vector field (Not fully implemented yet).
Parameters: cmap – Colormap for the image.
-
plot_diagnostic_plot
(centroid_index, cluster_name=None, cluster_color=None, cmap=None, rotate=0, z=None, use_embedding='tsne', known_signatures=[], correlation_methods=[])[source]¶ Plot the diagnostic plot. This method requires plot_tsne or plot_umap was run at least once before.
Parameters: - centroid_index (int) – Index of the centroid for the diagnostic plot.
- cluster_name (str) – The name of the cluster.
- cluster_color (str or list(float)) – The color of the cluster. Overrides cmap parameter.
- cmap (str or matplotlib.colors.Colormap) – The colormap for the clusters. The cluster color is determined using the centroid_index th color of the given colormap.
- rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3.
- z (int) – Z index to slice 3D vector norm and cell-type map plots. If not given, the slice at the middle will be used.
- use_embedding (str) – The type of the embedding for the last panel. Possible values are “tsne” or “umap”.
- known_signatures (list(tuple)) – The list of known signatures, which will be displayed in the 3rd panel. Each signature can be 3-tuple or 4-tuple, containing 1) the name of signature, 2) gene labels of the signature, 3) gene expression values of the signature, 4) optionally the color of the signature.
- correlation_methods (list(tuple)) – The correlation method used to determine max correlation of the centroid to the known_signatures. Each method should be 2-tuple, containing 1) the name of the correaltion, 2) the correaltion function (compatiable with the correlation methods available in scipy.stats)
-
plot_domains
(background='white', colors=None, cmap='jet', rotate=0, domain_background=False, background_alpha=0.3, z=None)[source]¶ Plot tissue domains.
Parameters: - background (str or list(float)) – Background color of the plot.
- colors (list(str), list(list(float))) – Color of the domains. Overrides cmap parameter.
- cmap (str or matplotlib.colors.Colormap) – Colormap for the domains.
- rotate (int) – Rotate the plot. Possible values are 0, 1, 2, and 3.
- domain_background (bool) – Show the area of the inferred domains behind the domain map.
- background_alpha (float) – The alpha value of the area of the inferred domains.
- z (int) – Z index to slice 3D domain map. If not given, the slice at the middle will be used.
-
plot_expanded_mask
(cmap='Greys')[source]¶ Plot the expanded area of the vectors (Not fully implemented yet).
Parameters: cmap – Colormap for the mask.
-
plot_l1norm
(cmap='viridis', rotate=0, z=None)[source]¶ Plot the L1-norm of the vector field.
Parameters:
-
plot_spatial_relationships
(cluster_labels, *args, **kwargs)[source]¶ Plot spatial relationship between cell types, presented as a heatmap.
Parameters:
-
plot_tsne
(run_tsne=False, pca_dims=10, n_iter=5000, perplexity=70, early_exaggeration=10, metric='correlation', exclude_bad_clusters=True, s=None, random_state=0, colors=[], excluded_color='#00000033', cmap='jet', tsne_kwargs={})[source]¶ Scatter plot the tSNE embedding.
Parameters: - run_tsne (bool) – If false, this method tries to load precomputed tSNE result before running tSNE.
- pca_dims (int) – Number of PCA dimensions used for the tSNE embedding.
- n_iter (int) – Maximum number of iterations for the tSNE.
- perplexity (float) – The perplexity value of the tSNE (please refer to the section How should I set the perplexity in t-SNE? in this link).
- early_exaggeration (float) – Early exaggeration parameter for tSNE. Controls the tightness of the resulting tSNE plot.
- metric (str) – Metric for calculation of distance between vectors in gene expression space.
- exclude_bad_clusters (bool) – If true, the vectors that are excluded by the clustering algorithm will not be considered for tSNE computation.
- s (float) – Size of the scatter dots.
- random_state (int or random state object) – Random seed or scikit-learn’s random state object to replicate the same result
- colors (list(str), list(list(float))) – Color of each clusters.
- excluded_color (str of list(float)) – Color of the vectors excluded by the clustering algorithm.
- cmap (str or matplotlib.colors.Colormap) – Colormap for the clusters.
- tsne_kwargs (dict) – Other keyward parameters for tSNE.
-
plot_umap
(run_umap=False, pca_dims=10, metric='correlation', exclude_bad_clusters=True, s=None, random_state=0, colors=[], excluded_color='#00000033', cmap='jet', umap_kwargs={})[source]¶ Scatter plot the UMAP embedding.
Parameters: - run_umap – If false, this method tries to load precomputed UMAP result before running UMAP.
- pca_dims (int) – Number of PCA dimensions used for the UMAP embedding.
- metric (str) – Metric for calculation of distance between vectors in gene expression space.
- exclude_bad_clusters (bool) – If true, the vectors that are excluded by the clustering algorithm will not be considered for tSNE computation.
- s (float) – Size of the scatter dots.
- random_state (int or random state object) – Random seed or scikit-learn’s random state object to replicate the same result
- colors (list(str), list(list(float))) – Color of each clusters.
- excluded_color (str of list(float)) – Color of the vectors excluded by the clustering algorithm.
- cmap (str or matplotlib.colors.Colormap) – Colormap for the clusters.
- umap_kwargs (dict) – Other keyward parameters for UMAP.
-
vf
¶ Vector field as a numpy.ndarray.
-
ssam.
run_sctransform
(data, clip_range=None, verbose=True, debug_path=None, plot_model_pars=False, **kwargs)[source]¶ Run ‘sctransform’ R package and returns the normalized matrix and the model parameters. Package ‘feather’ is used for the data exchange between R and Python. :param data: N x D ndarray to normlize (N is number of samples, D is number of dimensions). :type data: numpy.ndarray :param kwargs: Any keyword arguments passed to R function vst. :returns: A 2-tuple, which contains two pandas.dataframe:
- normalized N x D matrix.
- determined model parameters.