Getting started: validating a drug target with TwinCell¶
Question: Is Deucravacitinib an interesting drug for psoriasis?¶
Deucravacitinib is an oral, selective TYK2 inhibitor approved for moderate-to-severe plaque psoriasis. TYK2 relays IL-23, IL-12 and type-I interferon signalling through the JAK–STAT pathway — a central axis of psoriasis — so it is a strong candidate to investigate.
Rather than running a wet-lab experiment, we use TwinCell to ask a causal question directly from patient single-cell data:
Given the genes that change in psoriasis, does inhibiting Deucravacitinib's target (TYK2) causally explain that disease signature?
A high target-validation score means the target sits upstream of many of the disease-associated genes in the interactome — i.e. modulating it is mechanistically plausible for reversing the disease state.
Workflow at a glance
- Load psoriatic vs. healthy skin single-cell data.
- Aggregate cells into pseudo-bulk profiles per sample.
- Compute the psoriasis signature (differentially expressed genes) with PyDESeq2.
- Open a TwinCell session and validate TYK2 against that signature.
- Inspect the causal graph and paths that link the disease genes to the targets.
You need a
DEEPLIFE_API_KEY(from the TwinCell Console). Put it in a local.envfile or your environment before running the notebook.
1. Setup¶
Import the TwinCell SDK helpers and load your API key.
import os
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from dotenv import find_dotenv, load_dotenv
from IPython.display import display
from deeplife.twincell import (
TwinCell,
pseudobulk,
pydeseq2,
read_h5ad,
)
# Load DEEPLIFE_API_KEY from a local .env file (walks up from the notebook's cwd).
load_dotenv(find_dotenv(usecwd=True), override=True)
DEEPLIFE_API_KEY = os.environ.get("DEEPLIFE_API_KEY", "dl-xxx")
2. The drug and its targets¶
TwinCell validates protein targets, not drug names. Deucravacitinib selectively inhibits TYK2, so that is the target we will test against the psoriasis signature.
# Deucravacitinib's primary protein target (HGNC symbol).
deucravacitinib_targets = ["TYK2"]
3. Load the single-cell data¶
We use a public psoriasis skin atlas (GSE162183),
which contains both psoriatic and healthy (normal) skin biopsies annotated by cell
type. read_h5ad accepts local paths as well as s3:// / https:// URIs.
TwinCell expects HGNC gene symbols as the variable index, so we set them up here.
adata = read_h5ad(
"s3://dl-dev-global-personal/jbmorlot/cellxgenes/raw/adata_psoriasis_cxg_GSE162183.h5ad"
)
# Use HGNC gene symbols as the variable index, and drop duplicate symbols.
adata.var.rename(columns={"feature_name": "gene_name"}, inplace=True)
adata.var.set_index("gene_name", inplace=True)
adata = adata[:, ~adata.var.index.duplicated()]
adata
View of AnnData object with n_obs × n_vars = 24126 × 28080
obs: 'donor_id', 'Species', 'CellTypes', 'sample_id', 'protocol_url', 'institute', 'tissue_ontology_term_id', 'tissue_type', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'suspension_type', 'disease_ontology_term_id', 'library_id', 'library_preparation_batch', 'library_sequencing_run', 'alignment_software', 'manner_of_death', 'sample_source', 'sample_collection_method', 'sample_preservation_method', 'sequenced_fragment', 'reference_genome', 'cell_enrichment', 'gene_annotation_version', 'is_primary_data', 'sampled_site_condition', 'cell_type', 'assay', 'disease', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
var: 'GeneSym', 'feature_is_filtered', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'
uns: 'batch_condition', 'citation', 'is_pre_analysis', 'organism', 'organism_ontology_term_id', 'schema_reference', 'schema_version', 'study_pi', 'title'
obsm: 'X_umap'
Three obs columns drive the analysis:
disease— the condition (psoriasisvsnormal); this is the perturbation.cell_type— used to analyse a single, homogeneous population.sample_id— the biological replicate id used to build pseudo-bulk profiles.
perturbation_col = "disease" # "psoriasis" vs "normal"
cell_type_col = "cell_type"
batch_id_col = "sample_id" # pseudo-bulk replicate id
adata.obs[[perturbation_col, cell_type_col]].value_counts().reset_index().sort_values(by="cell_type")
| disease | cell_type | count | |
|---|---|---|---|
| 38 | normal | Langerhans cell | 21 |
| 37 | psoriasis | Langerhans cell | 22 |
| 22 | psoriasis | basal cell of epidermis | 207 |
| 33 | normal | basal cell of epidermis | 50 |
| 12 | psoriasis | cytotoxic T cell | 697 |
| 13 | normal | cytotoxic T cell | 644 |
| 21 | psoriasis | dendritic cell, human | 237 |
| 19 | normal | dendritic cell, human | 238 |
| 3 | psoriasis | endothelial cell | 1923 |
| 7 | normal | endothelial cell | 1315 |
| 26 | psoriasis | endothelial cell of lymphatic vessel | 163 |
| 23 | normal | endothelial cell of lymphatic vessel | 197 |
| 41 | normal | erythrocyte | 7 |
| 39 | psoriasis | erythrocyte | 19 |
| 16 | psoriasis | fibroblast | 442 |
| 18 | normal | fibroblast | 400 |
| 9 | normal | fibroblast of papillary layer of dermis | 887 |
| 8 | psoriasis | fibroblast of papillary layer of dermis | 917 |
| 15 | normal | granular cell of epidermis | 462 |
| 11 | psoriasis | granular cell of epidermis | 706 |
| 17 | psoriasis | hair follicular keratinocyte | 409 |
| 31 | normal | hair follicular keratinocyte | 117 |
| 27 | psoriasis | helper T cell | 163 |
| 25 | normal | helper T cell | 181 |
| 30 | psoriasis | macrophage | 117 |
| 20 | normal | macrophage | 238 |
| 28 | normal | melanocyte of skin | 155 |
| 34 | psoriasis | melanocyte of skin | 49 |
| 36 | normal | monocyte | 26 |
| 24 | psoriasis | monocyte | 187 |
| 40 | normal | natural killer cell | 14 |
| 35 | psoriasis | natural killer cell | 49 |
| 5 | normal | pericyte | 1560 |
| 2 | psoriasis | pericyte | 2140 |
| 29 | psoriasis | regulatory T cell | 119 |
| 32 | normal | regulatory T cell | 63 |
| 14 | psoriasis | skin fibroblast | 495 |
| 10 | normal | skin fibroblast | 858 |
| 6 | psoriasis | spinous cell of epidermis | 1341 |
| 1 | normal | spinous cell of epidermis | 2343 |
| 4 | normal | suprabasal keratinocyte | 1593 |
| 0 | psoriasis | suprabasal keratinocyte | 2355 |
4. Pseudo-bulk aggregation¶
Single cells are noisy and statistically dependent within a sample. pseudobulk sums counts
per sample_id × cell_type × disease group, producing a handful of robust replicates
per condition — the right input for differential expression. Groups with fewer than
n_min_replicates cells are dropped.
pdata = pseudobulk(
adata,
perturbation=perturbation_col,
cell_line=cell_type_col,
batch_id=batch_id_col,
n_min_replicates=20,
)
pdata
Building pseudo-bulk: 0%| | 0/125 [00:00<?, ?it/s]
Building pseudo-bulk: 100%|██████████| 125/125 [00:04<00:00, 31.13it/s]
AnnData object with n_obs × n_vars = 95 × 28080
obs: 'donor_id', 'Species', 'CellTypes', 'sample_id', 'protocol_url', 'institute', 'tissue_ontology_term_id', 'tissue_type', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'suspension_type', 'disease_ontology_term_id', 'library_id', 'library_preparation_batch', 'library_sequencing_run', 'alignment_software', 'manner_of_death', 'sample_source', 'sample_collection_method', 'sample_preservation_method', 'sequenced_fragment', 'reference_genome', 'cell_enrichment', 'gene_annotation_version', 'is_primary_data', 'sampled_site_condition', 'cell_type', 'assay', 'disease', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid', 'n_replicates', 'perturbation', 'cell_line', 'batch_id'
var: 'GeneSym', 'feature_is_filtered', 'feature_reference', 'feature_biotype', 'feature_length', 'feature_type'
5. Pick a cell type and split the arms¶
Deucravacitinib inhibits TYK2, which relays IL-23 / IL-12 / type-I interferon signalling. In psoriasis, dendritic cells are key drivers — they secrete IL-23 / IL-12 that ignite the IL-23/Th17 axis — so we focus on that population. We split the pseudo-bulk into the psoriasis (perturbed) and normal (control) arms for this cell type.
cell_type = "dendritic cell, human"
perturbation = "psoriasis"
control = "normal"
pdata_pert = pdata[
(pdata.obs[perturbation_col] == perturbation)
& (pdata.obs[cell_type_col] == cell_type)
]
pdata_control = pdata[
(pdata.obs[perturbation_col] == control)
& (pdata.obs[cell_type_col] == cell_type)
]
print(f"control: {pdata_control.shape}, perturbed: {pdata_pert.shape}")
control: (3, 28080), perturbed: (3, 28080)
6. The psoriasis signature (differential expression)¶
pydeseq2 runs PyDESeq2 on the two arms and returns a results table. Genes passing the
significance thresholds form the psoriasis signature — the set of transcriptional
changes TwinCell will try to causally explain.
log2fc_sig— minimum absolute log2 fold change.mlog10pvalue_sig— minimum (-\log_{10}(\text{adjusted } p)) (here1.3≈ adjusted p < 0.05).
de_results = pydeseq2(
adata=pdata_control.concatenate(pdata_pert),
design_factor=perturbation_col,
control_group=control,
log2fc_sig=1.0,
mlog10pvalue_sig=1.3,
)
significant_degs = de_results[de_results["significant"]].index.tolist()
print(f"{len(significant_degs)} significant DEGs (psoriasis vs normal)")
sns.scatterplot(
data=de_results, x="log2FoldChange", y="mlog10pvalue_adj", hue="significant"
)
plt.xlabel("log2 fold change (psoriasis vs normal)")
plt.ylabel("-log10 adjusted p-value")
plt.title(f"Psoriasis signature in {cell_type}")
plt.show()
Using None as control genes, passed at DeseqDataSet initialization
Fitting size factors... ... done in 0.01 seconds. Fitting dispersions... ... done in 4.70 seconds. Fitting dispersion trend curve... ... done in 0.86 seconds. Fitting MAP dispersions... ... done in 5.42 seconds. Fitting LFCs... ... done in 5.80 seconds. Calculating cook's distance... ... done in 0.02 seconds. Replacing 0 outlier genes. Running Wald tests... ... done in 2.70 seconds.
Log2 fold change & Wald test p-value: disease psoriasis vs normal
baseMean log2FoldChange lfcSE stat pvalue \
gene_name
CDIN1 6.030223 0.632790 0.846685 0.747374 0.454838
ENSG00000259375 0.819097 -0.029555 2.266034 -0.013043 0.989594
RSL1D1 62.537739 -0.479794 0.384382 -1.248223 0.211950
RIF1 24.050466 0.028556 0.534852 0.053390 0.957421
ENSG00000260988 0.000000 NaN NaN NaN NaN
... ... ... ... ... ...
EXTL3 6.867849 -0.351904 0.795013 -0.442639 0.658027
CAV2 2.712048 0.806493 1.306319 0.617378 0.536985
LINC01833 0.000000 NaN NaN NaN NaN
TMEM11-DT 1.209486 -0.886985 1.822794 -0.486607 0.626537
ENSG00000267205 0.147498 -0.972922 4.451244 -0.218573 0.826983
padj
gene_name
CDIN1 NaN
ENSG00000259375 NaN
RSL1D1 0.744032
RIF1 0.994092
ENSG00000260988 NaN
... ...
EXTL3 NaN
CAV2 NaN
LINC01833 NaN
TMEM11-DT NaN
ENSG00000267205 NaN
[28080 rows x 6 columns]
131 significant DEGs (psoriasis vs normal)
7. Open a TwinCell session¶
A TwinCell session bundles the control arm, the perturbed arm, and the disease signature.
On construction the SDK validates the inputs (shared genes, counts in .X, HGNC-style
DEG symbols) and checks API connectivity — look for a TwinCell: OK ... line in the
output.
tc = TwinCell(
pdata_control=pdata_control,
pdata_pert=pdata_pert,
degs=significant_degs,
model="v1.0",
api_key=DEEPLIFE_API_KEY,
)
pdata_control and pdata_pert share identical var_names order. DEG list: 131 submitted, 129 HGNC-style (usable by TwinCell), 2 not HGNC-style (ignored at inference; e.g. Ensembl ids). TwinCell: OK - input data ready for prediction (3 control and 3 perturbed observations, 131 DEGs submitted, 129 HGNC-style usable); API reachable at 'https://open-deeplife-api.deeplife.co'.
8. Validate Deucravacitinib's target¶
This is the heart of the analysis. For each target we call target_validation, which submits
the two arms + signature and runs the causal model server-side. get_target_score then
returns:
score— how strongly the target explains the disease signature (higher = stronger).percentage_degs_significant— fraction of the disease DEGs with causal support from this target.
We run target validation for TYK2 (the loop also lets you add more targets to compare in a single table).
prediction_ids = {}
score_rows = []
# Run target validation for each Deucravacitinib target - ETA 40s
for target in deucravacitinib_targets:
prediction_ids[target] = tc.target_validation(
target=target,
label=f"Deucravacitinib target {target} vs psoriasis signature",
)
score_rows.append(tc.get_target_score(prediction_id=prediction_ids[target]))
target_scores = pd.concat(score_rows, ignore_index=True)
target_scores
status=pending
TwinCellSession prediction_id: f8e709d4-72b8-4c67-81e6-4d0412744625
status=processing · step=inference_started · 18% status=completed · step=completed · 100%
| id | score | rank | rank_percentage | pvalue | z_score | protein_expressed | is_deg | node_type | n_degs_significant | percentage_degs_significant | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | TYK2|PROTEIN | 22.708529 | 575.0 | 4.264313 | None | None | True | False | PROTEIN | 2 | 1.680672 |
9. How do the targets connect to the disease genes?¶
A score is more convincing when you can see the mechanism. plot_causal_graph draws the
interactome subgraph linking the top disease DEGs to the validated target.
# Pick the highest-scoring validated target.
best_target = target_scores.sort_values("score", ascending=False)["id"].iloc[0]
best_prediction_id = prediction_ids[best_target.split("|")[0]]
print(f"Best target: {best_target}")
n_sig_degs = target_scores[target_scores['id']==best_target].n_degs_significant
tc.plot_causal_graph(prediction_id=best_prediction_id, top_n_degs=n_sig_degs)
Best target: TYK2|PROTEIN
10. Inspect the top causal paths¶
get_causal_paths returns the ranked routes from each impacted disease gene to the target.
For a TYK2 target you should see paths funnelling through STAT transcription factors and
other IL-23 / type-I-interferon JAK–STAT components — exactly the biology Deucravacitinib is
designed to dampen.
causal_paths = tc.get_causal_paths(prediction_id=best_prediction_id)
causal_paths.sort_values(by="score", ascending=False).head(15)
| deg | target | path | score | coverage | path_fraction | length | nodes | path_probability | score_deg_given_target | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | IL15 | TYK2|PROTEIN | IL15 → IRF1 → NPM1 → TYK2 | 0.768584 | 1.0 | 0.949464 | 4 | [IL15|RNA, IRF1|PROTEIN, NPM1|PROTEIN, TYK2|PR... | 0.007545 | 0.809492 |
| 1 | IFITM3 | TYK2|PROTEIN | IFITM3 → STAT2 → TYK2 | 0.763776 | 1.0 | 0.859208 | 3 | [IFITM3|RNA, STAT2|PROTEIN, TYK2|PROTEIN] | 0.008616 | 0.888930 |
| 2 | ZC3H12A | TYK2|PROTEIN | ZC3H12A → RELA → HRH1 → GRK2 → GNB2 → PTAFR → ... | 0.526158 | 1.0 | 0.957941 | 7 | [ZC3H12A|RNA, RELA|PROTEIN, HRH1|PROTEIN, GRK2... | 0.002227 | 0.549259 |
| 3 | ASGR2 | TYK2|PROTEIN | ASGR2 → IRF1 → NPM1 → TYK2 | 0.455786 | 1.0 | 0.722857 | 4 | [ASGR2|RNA, IRF1|PROTEIN, NPM1|PROTEIN, TYK2|P... | 0.002566 | 0.630534 |
| 4 | RASGEF1B | TYK2|PROTEIN | RASGEF1B → STAT1 → TYK2 | 0.421086 | 1.0 | 0.950878 | 3 | [RASGEF1B|RNA, STAT1|PROTEIN, TYK2|PROTEIN] | 0.001781 | 0.442840 |
| 5 | INSIG1 | TYK2|PROTEIN | INSIG1 → CCNT2 → HEXIM1 → NPM1 → TYK2 | 0.352654 | 1.0 | 0.853802 | 5 | [INSIG1|RNA, CCNT2|PROTEIN, HEXIM1|PROTEIN, NP... | 0.001408 | 0.413040 |
| 6 | KIF2A | TYK2|PROTEIN | KIF2A → TCF4 → CSNK2A1 → THAP1 → STRBP → EIF2A... | 0.336833 | 1.0 | 0.695527 | 7 | [KIF2A|RNA, TCF4|PROTEIN, CSNK2A1|PROTEIN, THA... | 0.000612 | 0.484285 |
| 7 | LAP3 | TYK2|PROTEIN | LAP3 → STAT1 → TYK2 | 0.313256 | 1.0 | 0.500629 | 3 | [LAP3|RNA, STAT1|PROTEIN, TYK2|PROTEIN] | 0.001983 | 0.625725 |
| 8 | LAP3 | TYK2|PROTEIN | LAP3 → STAT2 → TYK2 | 0.296286 | 1.0 | 0.473508 | 3 | [LAP3|RNA, STAT2|PROTEIN, TYK2|PROTEIN] | 0.001876 | 0.625725 |
| 9 | IL1R1 | TYK2|PROTEIN | IL1R1 → TCF4 → CSNK2A1 → THAP1 → STRBP → EIF2A... | 0.272955 | 1.0 | 0.510312 | 7 | [IL1R1|RNA, TCF4|PROTEIN, CSNK2A1|PROTEIN, THA... | 0.000580 | 0.534878 |
| 10 | HES1 | TYK2|PROTEIN | HES1 → EGR1 → RELA → HRH1 → GRK2 → GNB2 → PTAFR | 0.217661 | 1.0 | 0.519803 | 7 | [HES1|RNA, EGR1|PROTEIN, RELA|PROTEIN, HRH1|PR... | 0.000327 | 0.418738 |
| 11 | ASGR2 | TYK2|PROTEIN | ASGR2 → EGR1 → RELA → HRH1 → GRK2 → GNB2 → PTAFR | 0.168113 | 1.0 | 0.266620 | 7 | [ASGR2|RNA, EGR1|PROTEIN, RELA|PROTEIN, HRH1|P... | 0.000947 | 0.630534 |
| 12 | HES1 | TYK2|PROTEIN | HES1 → TCF4 → CSNK2A1 → THAP1 → STRBP → EIF2AK... | 0.143988 | 1.0 | 0.343862 | 7 | [HES1|RNA, TCF4|PROTEIN, CSNK2A1|PROTEIN, THAP... | 0.000216 | 0.418738 |
| 13 | IFITM3 | TYK2|PROTEIN | IFITM3 → STAT1 → TYK2 | 0.119006 | 1.0 | 0.133876 | 3 | [IFITM3|RNA, STAT1|PROTEIN, TYK2|PROTEIN] | 0.001343 | 0.888930 |
| 14 | IL1R1 | TYK2|PROTEIN | IL1R1 → STAT1 → TYK2 | 0.113968 | 1.0 | 0.213073 | 3 | [IL1R1|RNA, STAT1|PROTEIN, TYK2|PROTEIN] | 0.000242 | 0.534878 |
How much of the psoriasis signature does the target actually reach? get_all_degs lists the
DEGs that mapped onto the interactome, and get_degs_impacted_by_target lists those with
causal support above the model threshold.
all_degs = tc.get_all_degs(prediction_id=best_prediction_id)
impacted_degs = tc.get_degs_impacted_by_target(prediction_id=best_prediction_id)
print(f"{len(all_degs)} disease DEGs mapped onto the interactome")
print(
f"{len(impacted_degs)} are causally impacted by {best_target} "
f"({len(impacted_degs) / max(len(all_degs), 1):.0%} of mapped DEGs)"
)
print("Examples:", impacted_degs[:12])
119 disease DEGs mapped onto the interactome 10 are causally impacted by TYK2|PROTEIN (8% of mapped DEGs) Examples: ['IFITM3', 'IL15', 'ASGR2', 'LAP3', 'ZC3H12A', 'IL1R1', 'KIF2A', 'RASGEF1B', 'HES1', 'INSIG1']
Conclusion¶
Putting it together to answer the question — Is Deucravacitinib an interesting drug for psoriasis?
- We derived a psoriasis signature from patient dendritic cells (psoriasis vs. normal).
- We validated Deucravacitinib's target TYK2 against that signature with TwinCell.
- The target-validation score and the fraction of impacted DEGs quantify how well inhibiting TYK2 explains the disease state, and the causal graph / paths show the JAK–STAT routes connecting the disease genes to the target.
A high score with mechanistically coherent paths (through STAT factors) is supporting evidence that Deucravacitinib's mechanism is relevant to psoriasis — consistent with its approval for plaque psoriasis.
Next steps to try:
- Repeat for other cell types (e.g.
helper T cell,suprabasal keratinocyte) to see where the mechanism is strongest. - Compare against a negative-control target (an unrelated protein) on the same signature — it should score near zero, confirming specificity.
- Validate other psoriasis drugs by swapping in their targets (e.g.
IL17A,IL23A,JAK1).