Data Preparation¶
Download VISp data¶
In this tutorial we work with data of the murine primary visual cortex (VISp) profiled using multiplexed smFISH. Further details are available in the SSAM publication (Park, et. al. 2019).
First, download the data and unpack it:
curl "https://zenodo.org/record/3478502/files/supplemental_data_ssam_2019.zip?download=1" -o zenodo.zip
unzip zenodo.zip
Load data into python¶
Let’s start with loading our python packages:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ssam
Now we can load the mRNA spot table. Each row describes one mRNA spot and the columns contain its coordinates and target gene. We load the required columns into a dataframe:
df = pd.read_csv(
"zenodo/multiplexed_smFISH/raw_data/smFISH_MCT_CZI_Panel_0_spot_table.csv",
usecols=['x', 'y', 'z', 'target'])
If your dataset is organized differently, you will have to reshape it before continuing with the next steps. ## Transform Data
Because SSAM analysis is rooted in a cellular scale we transform the coordinates from a laboratory system into micrometers. Also we make them a bit tidier:
um_per_pixel = 0.1
df.x = (df.x - df.x.min()) * um_per_pixel + 10
df.y = (df.y - df.y.min()) * um_per_pixel + 10
df.z = (df.z - df.z.min()) * um_per_pixel + 10
Prepare data for SSAM¶
To create a SSAMDataset
object we need to provide four arguments: -
a list of gene names profiled in the experiment: genes
- a list of
lists that contains the coordinates of each gene: coord_list
- the
width
of the image - the height
of the image
The width and height are straightforward to infer from the dimensions of the image:
width = df.x.max() - df.x.min() + 10
height = df.y.max() - df.y.min() + 10
We group the dataframe by gene and create the list of gene names:
grouped = df.groupby('target').agg(list)
genes = list(grouped.index)
And finally the coordinate list:
coord_list = []
for target, coords in grouped.iterrows():
coord_list.append(np.array(list(zip(*coords))))
Create the SSAMDataset
object¶
With everything in place we can now instantiate the SSAMDataset
object:
ds = ssam.SSAMDataset(genes, coord_list, width, height)
Now we can start the analysis with the kernel density estimation step.