Tutorial 6 De novo NOI discovery on Perturb-CAST

This tutorial trains QueST on the Perturb-CAST dataset, including six Visium slices from a hepatocellular-carcinoma mouse model under combinatorial in-vivo CRISPR perturbations. We extract niche embeddings, project them with UMAP, cluster them with Leiden, and use the resulting clusters to compute a Niche Prevalence Ratio (NPR) score that identifies locally-enriched niche types on a target slice.

[ ]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import sys
import logging
import warnings
import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
import seaborn as sns
import umap as umap_lib
[ ]:
import sys, os
import quest.utils as utils
from quest.trainer import QueSTTrainer
[ ]:
warnings.filterwarnings('ignore')
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.WARNING)

1. Load data

[ ]:
dataset = 'Perturb-CAST'
data_path = '../data/Perturb-CAST'
model_path = '../results/Perturb-CAST/model/quest_model.pth'
sample_ids = ['ML_I', 'ML_II_A_1', 'ML_II_B', 'ML_II_C', 'ML_III_A', 'ML_III_B']
adata_list = [sc.read_h5ad(f'{data_path}/{sid}.h5ad') for sid in sample_ids]
for adata, sid in zip(adata_list, sample_ids):
    adata.uns['library_id'] = sid
print(f'Loaded {len(adata_list)} slices: {[a.n_obs for a in adata_list]} spots')
Loaded 6 slices: [4217, 3870, 3956, 3608, 4134, 2697] spots

2. Train QueST

[ ]:
trainer = QueSTTrainer(
    dataset=dataset, data_path=data_path,
    sample_ids=sample_ids, adata_list=adata_list,
    query_niches=None, query_sample_id=None,
    model_path=model_path,
    epochs=15, save_model=True,
    hvg=4000, min_count=10, normalize=True,
    seed=2024,
)

We also provided pretrained QueST model checkpoint weights at https://cloud.tsinghua.edu.cn/d/ac873d6098ad4016818a/. To skip training and use pretrained checkpoint, simiply put it at corresponding model path and comment the “trainer.train()” line.

[ ]:
trainer.train()
trainer.inference(ckpt_path=model_path)
computing 3-hop subgraph (ML_I): 100%|██████████| 4217/4217 [00:07<00:00, 592.88it/s]
computing 3-hop subgraph (ML_II_A_1): 100%|██████████| 3870/3870 [00:05<00:00, 673.58it/s]
computing 3-hop subgraph (ML_II_B): 100%|██████████| 3956/3956 [00:05<00:00, 690.76it/s]
computing 3-hop subgraph (ML_II_C): 100%|██████████| 3608/3608 [00:05<00:00, 654.84it/s]
computing 3-hop subgraph (ML_III_A): 100%|██████████| 4134/4134 [00:05<00:00, 725.74it/s]
computing 3-hop subgraph (ML_III_B): 100%|██████████| 2697/2697 [00:03<00:00, 716.47it/s]
training: 100%|██████████| 15/15 [04:02<00:00, 16.18s/epoch]

3. Global nodule niche embedding UMAP by slice

[ ]:
global_adata = utils.get_global_niche_adata(trainer, group_key='histo_annotation', exclude_values=['normal_tissue'], min_count=3)
[ ]:
utils.plot_niche_umap(global_adata, color='slice', umap_key='X_umap')
_images/Tutorial_6_De_novo_NOI_discovery_on_Perturb-CAST_12_0.png

4. De novo NOI discovery and niche querying

We use slice ML_III_B as the source sample for NOI discovery, as it has the highest density of nodules.

[ ]:
source_sample = 'ML_III_B'
source_full = trainer.adata_list[sample_ids.index(source_sample)].copy()
utils.get_niche_pattern(global_adata)
cl_score = utils.compute_npr_score(source_full, global_adata, source_sample, min_local=10, high_threshold=3)
utils.plot_npr_spatial(source_full, score_key='npr_score', title=f'NPR score on {source_sample}')
_images/Tutorial_6_De_novo_NOI_discovery_on_Perturb-CAST_14_0.png

We then show the other niches that shares the same niche patterns with NOI.

[ ]:
mask = source_full.obs['npr_score'] > 6
noi_spots = [(source_sample, r, c) for r, c in zip(
    source_full.obs.loc[mask, 'array_row'],
    source_full.obs.loc[mask, 'array_col'])]
utils.plot_noi_query(trainer.adata_list, sample_ids, global_adata, noi_spots)
_images/Tutorial_6_De_novo_NOI_discovery_on_Perturb-CAST_16_0.png