API Reference
Preprocessing files
Utilities for annotating reads in an allinfo file with cell types and constructing sparse isoform matrices.
This module provides functions to:
Label reads with celltype-region labels from segmentation data
Generate auxiliary CSV outputs for scisorseqr (isoform IDs, counts per cluster)
Build sparse cell-by-isoform matrices for downstream analysis
Combine two-slide experiments
Load sparse matrices into DataFrames
- SplIsoFind.preprocess.allinfo_addct(fn_allinfo, fn_CIDmap, fn_adata)
Annotate raw read allinfo file with cell-type-region labels and save filtered file.
- Parameters:
fn_allinfo (str) – Path to raw AllInfo TSV file (index_col=0, no header).
fn_CIDmap (str) – Path to TSV file with mapping of barcodes to original CellIDs.
fn_adata (str) – Path to AnnData .h5ad file with obs containing ‘first_type’ (cell type), ‘spot_class’ (whether a cell is a singlet), and ‘subregion’.
- Returns:
Writes <fn_allinfo>.filtered.labeled.gz with same format.
- Return type:
None
- SplIsoFind.preprocess.constructSparseMatrix(allinfo, fn_CIDmap, gene_isoform_count)
Build a cell-by-isoform sparse matrix of relative expression values.
- Parameters:
allinfo (pandas.DataFrame) – Filtered AllInfo with reads.
fn_CIDmap (str) – Path to barcode-to-CellID mapping TSV.
gene_isoform_count (pandas.DataFrame) – DataFrame with columns [Gene, Isoform, count].
- Return type:
tuple[csr_matrix,list]- Returns:
x_sparse (csr_matrix) – Shape (n_cells, n_transcripts), values = PSI per cell.
gene_isoform_list (list of tuple) – List of (Gene, Isoform) pairs matching matrix columns.
- SplIsoFind.preprocess.create_auxiliary_files(fn_allinfo, output_dir)
Generate isoform ID mapping and counts-per-cluster files as input for scisorseqr.
- Parameters:
fn_allinfo (str) – Path to labeled AllInfo TSV file (index_col=0, no header).
output_dir (str) – Directory to write ‘Iso-IsoID.csv’ and ‘NumIsoPerCluster’ files.
- Return type:
None
- SplIsoFind.preprocess.create_isoform_matrix(fn_allinfo, fn_CIDmap, fn_adata, output, mincells=50, mincellspergroup=20)
Build and save a filtered cell by isoform sparse matrix for a single slide.
This function:
Loads read-level data via readAllInfo.
Filters genes with at least mincells total reads.
Filters isoforms with at least mincellspergroup reads.
Retains genes with ≥2 isoforms after filtering.
Constructs a cell-by-isoform sparse PSI matrix.
Filters matrix columns by the same cell count thresholds.
Saves:
X_sparse.npz: CSR matrix of PSI values.
genes_isoforms.csv: List of (Gene, Isoform) pairs.
labels.csv: Cell metadata from AnnData.
- Parameters:
fn_allinfo (str) – Path to labeled AllInfo TSV (filtered by allinfo_addct), index_col=0.
fn_CIDmap (str) – Path to TSV mapping barcodes to original CellIDs.
fn_adata (str) – Path to AnnData .h5ad file for cell ordering and labels.
output (str) – Directory to write output files (X_sparse.npz, etc.).
mincells (int, default=50) – Minimum total reads per isoform to keep a gene.
mincellspergroup (int, default=20) – Minimum reads in high/low group per isoform to include.
- Return type:
None
- SplIsoFind.preprocess.create_isoform_matrix_twoslides(fn_allinfo_S1, fn_allinfo_S2, fn_CIDmap_S1, fn_CIDmap_S2, fn_adata_S1, fn_adata_S2, output, mincells=50, mincellspergroup=20)
Build and save a filtered cell by isoform sparse matrix combining two slides.
This function performs the same filtering and matrix construction as create_isoform_matrix, but for two separate slides whose results are then vertically concatenated:
Load and concatenate readAllInfo outputs for slide1 and slide2.
Apply gene and isoform read count filters.
Construct separate sparse matrices via constructSparseMatrix.
Subset each matrix by cell order from respective AnnData.
Vertically stack matrices and concatenate labels.
Apply cell-count filters on the combined matrix.
Save outputs similarly to single-slide version.
- Parameters:
fn_allinfo_S1 (str) – Paths to filtered AllInfo TSVs for slide1 and slide2.
fn_allinfo_S2 (str) – Paths to filtered AllInfo TSVs for slide1 and slide2.
fn_CIDmap_S1 (str) – Paths to barcode-to-CellID TSVs for each slide.
fn_CIDmap_S2 (str) – Paths to barcode-to-CellID TSVs for each slide.
fn_adata_S1 (str) – Paths to AnnData .h5ad files for each slide.
fn_adata_S2 (str) – Paths to AnnData .h5ad files for each slide.
output (str) – Directory to write output files (X_sparse.npz, etc.).
mincells (int, default=50) – Minimum total reads per isoform to keep a gene.
mincellspergroup (int, default=20) – Minimum reads in high/low group per isoform to include.
- Return type:
None
- SplIsoFind.preprocess.load_sparse(input_dir)
Load a saved sparse isoform matrix, labels, and genes/isoform mapping.
- Parameters:
input_dir (str) –
Directory containing output files from create_isoform_matrix*. Expected files:
X_sparse.npz: sparse matrix (cells × isoforms)
labels.csv: cell metadata
genes_isoforms.csv: isoform-to-gene mapping
- Return type:
tuple[csr_matrix,DataFrame,DataFrame]- Returns:
x (csr_matrix) – Sparse matrix of cell-by-isoform relative expression. Zeros are explicitly stored; missing values indicate NaNs.
labels (pandas.DataFrame) – Cell metadata indexed by cell identifier.
gene_isoform (pandas.DataFrame) – Gene and isoform identifiers (column metadata for x_sparse). Expected columns: ‘Gene ID’, ‘Transcript ID’.
- SplIsoFind.preprocess.readAllInfo(fn_allinfo)
Load and filter AllInfo table (remove ‘None’ isoforms).
- Parameters:
fn_allinfo (str) – Path to AllInfo TSV (index_col=0, no header).
- Return type:
pandas.DataFrame
- SplIsoFind.preprocess.sparse2df(input_dir)
Load a saved sparse isoform matrix into dense DataFrame form.
This function:
Reads labels.csv and genes_isoforms.csv.
Loads X_sparse.npz as a CSR matrix.
Converts zeros to NaN via a mask.
Returns:
x: DataFrame of PSI values (cells × isoforms).
labels: DataFrame of cell metadata.
- Parameters:
input_dir (str) – Directory containing output files from create_isoform_matrix*.
- Return type:
tuple[DataFrame,DataFrame]- Returns:
x (pandas.DataFrame) – Cell-by-isoform relative expression matrix (NaN where zero counts).
labels (pandas.DataFrame) – Cell metadata indexed by cell identifier.
Detecting spatially variable isoforms
Spatial autocorrelation functions using Moran’s I.
This module provides functions to compute Moran’s I for long-read spatial transcriptomics data, including permutation-based significance testing and FDR correction.
- SplIsoFind.spatially_variable.calculate_weight_matrix_sklearn(locations, k)
Build a libpysal spatial weights matrix via k-nearest neighbors.
- Parameters:
locations (pandas.DataFrame) – Coordinates of cells, shape (n_cells, 2), columns = [x, y].
k (int) – Number of neighbors to include (self excluded).
- Returns:
libpysal.weights.W object with equal weights to k neighbors.
- Return type:
W
- SplIsoFind.spatially_variable.moransI(x, labels, nperm=100000, k=10, mincells=50, imb=0.05, mincellspergroup=20, celltypes=['All', 'ExciteNeuron', 'InhibNeuron', 'Astro', 'Oligo'], x_coord='x', y_coord='y', output_dir='')
Compute Moran’s I scores, p-values, and q-values (Benjaminini-Yekutieli FDR-corrected) for each isoform and cell type.
- Parameters:
x (pandas.DataFrame) – Feature matrix of shape (n_cells, n_isoforms), with relative expression values between 0 and 1.
labels (pandas.DataFrame) – Cell metadata, must contain columns for x_coord, y_coord, spot_class, and first_type.`spot_class` and first_type are used to filter cells per cell type.
nperm (int, default=100000) – Number of permutations for significance testing.
k (int, default=10) – Number of nearest neighbors for spatial weight matrix.
mincells (int, default=50) – Minimum number of cells required to have relative expression values for an isoform.
imb (float, default=0.05) – Minimum ratio of minority group to total.
mincellspergroup (int, default=20) – Minimum number of cells per binary group (x > 0.5 or ≤ 0.5).
celltypes (list of str) – List of cell type labels to test; ‘All’ computes on all cells.
x_coord (str) – Column names in labels for spatial coordinates.
y_coord (str) – Column names in labels for spatial coordinates.
output_dir (str) – Directory path to save Moran’s I scores, p-values, and q-values as CSV files.
- Return type:
tuple[DataFrame,DataFrame,DataFrame]- Returns:
mI (pandas.DataFrame) – Moran’s I scores, index = isoforms, columns = celltypes.
pval (pandas.DataFrame) – Permutation p-values, index = isoforms, columns = celltypes.
qval (pandas.DataFrame) – FDR-corrected q-values, index = isoforms, columns = celltypes.
- SplIsoFind.spatially_variable.moransI_ctperm(x, labels, variables, nperm=10000, k=10, x_coord='x', y_coord='y', output_dir='')
Compute Moran’s I with cell-type constrained permutation.
Performs a two-phase test: original Moran’s I permutation test (shuffling all cells) and a permutation test while shuffling cells only within the same cell type.
- Parameters:
x (pandas.DataFrame) – Feature matrix of shape (n_cells, n_isoforms), with relative expression values between 0 and 1.
labels (pandas.DataFrame) – Cell metadata, must contain columns for x_coord, y_coord, ‘spot_class’,’first_type’,’first_type_weight’,’second_type’.
variables (list of str) – Subset of columns in x to test.
nperm (int, default=10000) – Number of random permutations for cell-type assignment.
k (int, default=10) – Number of neighbors for spatial weight matrix.
x_coord (str) – Column names in labels for spatial coordinates.
y_coord (str) – Column names in labels for spatial coordinates.
output_dir (str) – Directory path to save result as CSV files.
- Returns:
res – Index = tested isoforms, columns = [‘morans I’,’p-value (original)’, ‘p-value (new)’,’Num cells’,’Imbalance’].
- Return type:
pandas.DataFrame
Visualizing results
Spatial plotting utilities for long-read spatial transcriptomics data.
This module provides functions to visualize spatial transcriptomics data as hexbin overlays, read tree-traversal isoform results, and generate heatmap tiles with accompanying bar plots.
- SplIsoFind.plotting.barplot_psi(x_sparse, labels, var_info, varName, celltype='', figsize=(6, 4), color='gray')
Function to plot a barplot with mean PSI value per brain region.
Extracts the specified transcript column from the sparse PSI matrix and computes mean PSI per region for singlets only and optionally a given cell type.
- Parameters:
x_sparse (scipy.sparse.csr_matrix) – Sparse PSI matrix (cells × isoforms).
labels (pandas.DataFrame) – Cell metadata indexed by cell IDs. Must contain at least [‘region’, ‘spot_class’, ‘first_type’].
var_info (pandas.DataFrame) – Isoform metadata with at least a ‘Transcript ID’ column that matches x_sparse columns.
varName (str) – Name of the transcript to plot (must match a value in var_info[‘Transcript ID’]).
celltype (str, optional) – Filter for a specific cell type (matches labels[‘first_type’]). Default is “”, meaning no filtering by cell type.
**kwargs – Additional keyword arguments passed to psi_region_barplot, e.g. figsize=(6,4), color=’gray’, etc.
- Returns:
fig – A bar plot showing mean PSI per region for singlets (and optionally per cell type).
- Return type:
matplotlib.figure.Figure
- SplIsoFind.plotting.get_barplot_counts(allinfo, ct_comp_file)
Sum read counts per group based on a cell-type composition file.
- Parameters:
allinfo (pandas.DataFrame) – Raw info with subgroup IDs in column 2.
ct_comp_file (str) – Path to tab-separated file mapping groups and subgroups.
- Returns:
Index = group names, values = total counts.
- Return type:
pandas.Series
- SplIsoFind.plotting.get_countmatrix(res, region_map=None)
Build symmetric count and percent matrices from pairwise results.
- Parameters:
res (pandas.DataFrame) – Output of read_results, with columns [‘reg1’,’reg2’,’sig’,’perc’].
region_map (dict, optional) – Mapping from raw region codes to display labels.
- Returns:
count_matrix, percent_matrix – Square DataFrames indexed and columned by region names.
- Return type:
tuple of pandas.DataFrame
- SplIsoFind.plotting.get_text_color(rgba)
Choose black or white text based on background brightness.
- Parameters:
rgba (tuple) – RGBA color tuple.
- Returns:
‘white’ if brightness < 0.5 else ‘black’.
- Return type:
str
- SplIsoFind.plotting.plot_heatmap(input_dir, dataset, region, celltype, allinfo, region_map=None, region_map2=None, figsize=(4, 4), cmap_count=<matplotlib.colors.LinearSegmentedColormap object>, cmap_percent=<matplotlib.colors.LinearSegmentedColormap object>, fontsize_tiles=11, fontsize_ticks=12, fn=None, vmax_count=None, vmax_perc=None)
Draw a lower-triangle heatmap with % and count triangles plus side barplot.
- Parameters:
input_dir (str) – Base results directory.
dataset (str) – Dataset subfolder.
region (str) – Region identifier.
celltype (str) – Cell type identifier.
allinfo (pandas.DataFrame) – Raw info for barplot counts.
region_map (dict, optional) – Mapping for region labels.
region_map2 (dict, optional) – Mapping for region labels.
figsize (tuple) – Figure size.
cmap_count (Colormap) – Colormaps for count and percent triangles.
cmap_percent (Colormap) – Colormaps for count and percent triangles.
fontsize_tiles (int) – Font sizes for tile text and ticks.
fontsize_ticks (int) – Font sizes for tile text and ticks.
fn (str, optional) – Path to save figure; if None, figure is shown and not saved.
vmax_count (float, optional) – Manual maximum for color normalization.
vmax_perc (float, optional) – Manual maximum for color normalization.
- Return type:
None
- SplIsoFind.plotting.read_results(input_dir, dataset, region, celltype)
Read and summarize isoform traversal results from scisorseqr output.
Parses input_dir/dataset/res_scisorseqr/CellTypes_{celltype}_{region}/TreeTraversal_Iso/*/results.csv, counts total tests and significant hits per subdirectory.
- Parameters:
input_dir (str) – Base directory for results.
dataset (str) – Name of dataset subfolder.
region (str) – Region identifier.
celltype (str) – Cell type identifier.
- Returns:
Columns: [‘reg1’,’reg2’,’tested’,’sig’,’perc’].
- Return type:
pandas.DataFrame
- SplIsoFind.plotting.spatial_hexplot(x, labels, varName, imarray=None, celltype='', region='', subregion='', hexsize=120, fig_size=(5, 5), ax=None, plot_lim=None, alpha=1, cmap='viridis', show_colorbar=True, staining_max='grey', staining_min='white', linewidths=0.1)
Overlay a hexbin of relative isoform expression values on a background staining image.
Extracts values for varName from x, subsets by celltype if provided, computes hexbin grid based on spot coordinates, and draws on ax or a new figure.
- Parameters:
x (pandas.DataFrame) – Feature matrix of shape (n_cells, n_isoforms), with relative expression values between 0 and 1.
labels (pandas.DataFrame) – Cell metadata with columns [‘x’,’y’,’first_type’,’spot_class’].
varName (str) – Name of the column in x to plot.
imarray (array-like or None) – Background image array; if None, only hexbin is shown.
celltype (str) – If non-empty, only spots of this first_type and singlets are shown.
region (str) – If non-empty, only spots of this region are shown.
subregion (str) – If non-empty, only spots of this subregion are shown.
hexsize (int) – Approximate pixel diameter for hexbin cells.
fig_size (tuple) – Size of new figure if ax is None.
ax (matplotlib.axes.Axes or None) – Axis to draw on; if None, a new one is created.
plot_lim (tuple (xmin, xmax, ymin, ymax) or None) – If provided, crops both image and hex positions.
alpha (float) – Transparency for background image overlay.
cmap (str or Colormap) – Colormap for hexbin values.
show_colorbar (bool) – Whether to display a colorbar labeled “Fraction”.
staining_max (color spec) – Colors to map background image to.
staining_min (color spec) – Colors to map background image to.
linewidths (float) – Width of edges between hex cells.
- Returns:
ax – Axis containing the hexbin overlay.
- Return type:
matplotlib.axes.Axes
- SplIsoFind.plotting.spatial_hexplot_sparse(x_sparse, labels, var_info, varName, **kwargs)
Wrapper for spatial_hexplot that handles sparse input.
Extracts the requested isoform column from a CSR sparse matrix and constructs a DataFrame for compatibility with spatial_hexplot. Only explicitly stored values (including explicit zeros) are included in the plot.
- Parameters:
x_sparse (scipy.sparse.csr_matrix) – Sparse matrix of relative expression values (cells × isoforms). Explicit zeros are retained, missing values are implicit.
labels (pandas.DataFrame) – Cell metadata indexed by cell IDs. Must contain columns [‘x’, ‘y’, ‘first_type’, ‘spot_class’].
var_info (pandas.DataFrame) – Isoform metadata with at least a ‘Transcript ID’ column that matches x_sparse columns.
varName (str) – Name of the transcript to plot (must match a value in var_info[‘Transcript ID’]).
**kwargs – All additional parameters are passed directly to spatial_hexplot (e.g., imarray, fig_size, cmap, alpha, celltype, ax, etc).
- Returns:
ax – The axis with the hexbin overlay.
- Return type:
matplotlib.axes.Axes