Preprocessing¶

Notebook-oriented helpers that turn raw single-cell counts into TwinCell's inputs. These are the exact functions used in the tutorial:

pseudobulk() — aggregate single cells into sample-level pseudo-bulk profiles.
pydeseq2() — compute differentially expressed genes (the disease signature) between two conditions.

from deeplife.twincell import pseudobulk, pydeseq2

pseudobulk¶

Aggregate cells into pseudo-bulk by grouping keys. In the tutorial it is called as:

pdata = pseudobulk(
    adata,
    perturbation="disease",      # obs column for the condition (psoriasis vs normal)
    cell_line="cell_type",       # obs column for the cell type / line
    batch_id="sample_id",        # obs column for the replicate / sample id
    n_min_replicates=20,         # drop groups with fewer cells
)

pseudobulk ¶

pseudobulk(
    adata: AnnData,
    ad_obs: None | DataFrame = None,
    perturbation: None | str = None,
    cell_line: None | str = None,
    batch_id: None | str = None,
    n_min_replicates: int = 20,
    agg_layers: bool = False,
    agg_obsm: bool = False,
) -> AnnData

Aggregate single-cell counts into pseudobulks by grouping keys.

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	Input AnnData with counts in `.X` and annotations in `.obs`. If adata contains `.layers` or `.obsm`, they will also be aggregated.	required
`ad_obs`	`None \| DataFrame`	Optional explicit observation table; defaults to `adata.obs`.	`None`
`perturbation`	`None \| str`	obs key for perturbation identifier (stratification).	`None`
`cell_line`	`None \| str`	obs key for cell line identifier.	`None`
`batch_id`	`None \| str`	obs key for batch identifier.	`None`
`n_min_replicates`	`int`	Minimum number of replicates to aggregate. Default is 20.	`20`
`agg_layers`	`bool`	Whether to aggregate layers. Default is False.	`False`
`agg_obsm`	`bool`	Whether to aggregate obsm. Default is False.	`False`

Returns:

Type	Description
`AnnData`	anndata.AnnData: Pseudobulked AnnData with: - `.X`: Sum of counts per pseudobulk - `.layers`: Sum of layer counts per pseudobulk (if present in input) - `.obsm`: Mean of embeddings/coordinates per pseudobulk (if present in input) - `.obs`: Aggregated metadata with n_replicates - `.var`: Copied from input adata

pydeseq2¶

Run PyDESeq2 on the two arms and flag significant DEGs. In the tutorial:

de_results = pydeseq2(
    adata=pdata_control.concatenate(pdata_pert),
    design_factor="disease",     # obs column with the conditions
    control_group="normal",      # reference level in design_factor
    log2fc_sig=1.0,              # |log2 fold change| threshold for `significant`
    mlog10pvalue_sig=1.3,        # -log10(adjusted p) threshold for `significant`
)
significant_degs = de_results[de_results["significant"]].index.tolist()

pydeseq2 ¶

pydeseq2(
    adata: AnnData,
    design_factor: str,
    control_group: str,
    log2fc_sig: float | None = None,
    mlog10pvalue_sig: float | None = None,
) -> DataFrame

Run DESeq2 and compute results for each perturbation vs control.

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	AnnData object with raw counts in adata.X.	required
`design_factor`	`str`	The column in adata.obs with experimental conditions (e.g. `"perturbation"`).	required
`control_group`	`str`	The value in the design_factor column that represents the control (e.g. `"control"`).	required
`log2fc_sig`	`float \| None`	Absolute log2 fold-change threshold for the `significant` column. Both `log2fc_sig` and `mlog10pvalue_sig` must be provided to add the column.	`None`
`mlog10pvalue_sig`	`float \| None`	-log10(padj) threshold for the `significant` column.	`None`

Returns:

Type	Description
`DataFrame`	A pandas DataFrame indexed by `gene_name`, sorted by
`DataFrame`	`mlog10pvalue_adj` descending.

Batch / CLI use¶

The two helpers above are thin wrappers over standalone packages that also ship command-line entry points for batch pipelines:

Pseudo-bulk — deeplife.pseudobulk (CLI: twincell-pseudobulk)
Differential expression — deeplife.differential_expression (CLI: twincell-diffexpr)

python -m deeplife.pseudobulk.main --help
python -m deeplife.differential_expression.main --help

For notebooks, prefer the pseudobulk() / pydeseq2() helpers documented above.