TwinCell¶

The headline reference for TwinCell: the data requirements and the high-level TwinCell study class used throughout the tutorial and the use cases.

The data-preparation helpers (pseudobulk() / pydeseq2()) live on the Preprocessing page.

from deeplife.twincell import TwinCell, read_h5ad

Data requirements¶

TwinCell accepts transcriptomic data as .h5ad (AnnData). For target validation you bring a target and one or two cell states (e.g. disease vs. healthy).

Input files must contain:

Raw gene expression counts — not normalized. In adata.X or a named layer (via raw_layer_name, which refers to a counts layer in adata.layers, not adata.raw).
Condition labels — a column in adata.obs identifying each cell/sample's state (e.g. "ctrl" vs. "stim").
Batch / sample grouping — a column in adata.obs for sample-level grouping (e.g. replicate). Used via batch_id_col; these are sample identifiers, not sequencing-batch labels.
Cell type annotations (single-cell only) — each prediction analyzes one cell type at a time.

Example obs schema:

Column	Required?	Example values	Notes
`condition`	yes	`ctrl`, `stim`	Two values; used as the contrast.
`sample_id`	yes	`donor_1`, `donor_2`, …	Sample-level grouping for pseudo-bulk / DE.
`cell_type`	single-cell only	`CD4 T cells`, `Monocytes`, …	One prediction per cell type.

Other requirements:

One or two cell states, with at least two biological replicates per arm for differential expression.
Gene identifiers as standard gene symbols in adata.var_names.
The dataset must fit in memory during local preprocessing — subset to the cell types of interest for very large datasets.

Raw counts only

TwinCell requires raw counts — the pipeline handles normalization internally. Passing pre-normalized data produces unreliable results.

Reading data¶

read_h5ad ¶

read_h5ad(
    path_or_url: str | Path,
    *,
    destination: str | Path | None = None,
    timeout_seconds: float = 300.0,
    sanitize: bool = True,
) -> AnnData

Read AnnData from a local .h5ad file or a remote URI.

Local: pathlib.Path or a filesystem string (including file:// URIs). The file must exist; destination is ignored.

Remote: http://, https://, or s3:// URIs. Streams to destination when set, otherwise to a temporary file that is removed after load. For s3://, uses boto3 with the default credential chain.

Parameters:

Name	Type	Description	Default
`path_or_url`	`str \| Path`	Path to a local `.h5ad`, or `https://` / `http://` / `s3://` URI.	required
`destination`	`str \| Path \| None`	Optional output path when downloading a remote object (parent dirs are created). Ignored for local paths.	`None`
`timeout_seconds`	`float`	HTTP client timeout for HTTP(S); connect + read timeouts for S3.	`300.0`
`sanitize`	`bool`	When True (default), coerce `obs` and `var` columns to string dtype and ensure unique `obs_names` / `var_names` (in-place on the returned object). Pass `False` to return the object exactly as stored in the file.	`True`

Returns:

Type	Description
`AnnData`	In-memory `anndata.AnnData`.

Raises:

Type	Description
`FileNotFoundError`	When a local path does not exist or is not a file.
`ValueError`	When a remote URI uses an unsupported scheme or malformed `s3://`.
`HTTPError`	On HTTP(S) network or status errors.
`ClientError`	On S3 API errors (e.g. missing object).

When sanitize is True, duplicate obs / var name warnings from AnnData during file load are suppressed because names are uniquified immediately after.

TwinCell study¶

The notebook-oriented entry point. Construct from a control + perturbed AnnData pair plus a DEG list, then call target_validation() and inspect the result with the score / causal-path / graph methods (see the tutorial).

The methods used in the tutorial are target_validation(), get_target_score(), get_causal_paths(), plot_causal_graph(), get_degs_impacted_by_target(), and get_all_degs().

TwinCell ¶

TwinCell(
    *,
    pdata_control: AnnData,
    pdata_pert: AnnData | None = None,
    degs: list[str] | None = None,
    model: str = DEFAULT_TWINCELL_MODEL_VERSION,
    api_key: str,
    base_url: str | None = None,
    validate_on_init: bool = True,
    max_obs_per_anndata: int | None = None,
    check_api_on_init: bool = True,
    api_check_timeout_seconds: float = 15.0,
)

High-level handle for a split control vs perturbed AnnData pair and DEG list.

Provide expression matrices in adata.X and HGNC-style symbols in degs. This workflow expects two objects (not a merged pseudo-bulk matrix); see validate_twincell_split_anndata().

Suggested flow: run target_id() to submit data and obtain a prediction, then causal_analysis() for graph-style follow-up on a protein target. Use target_validation() for the integrated validation path: it submits the same split arms via POST /v1/predictions with job_type=target_validation (no separate causal POST). Use plot_causal_graph() for graphs from causal_analysis() when you hold a {"causal": ...} dict. For target validation, use plot_causal_graph() on the instance with top_n_degs (GET .../causal-graph).

On construction, local split validation runs and (by default) a quick API connectivity check (GET /health plus authenticated GET /v1/predictions). Progress and a final ready message are printed to stdout for notebooks.

Call close() when finished to release HTTP resources.

prediction_id `property` ¶

prediction_id: UUID | None

Active prediction id on the internal TwinCellSession.

session `property` ¶

session: TwinCellSession

Underlying HTTP session (shared client, active prediction_id).

close ¶

close() -> None

Close the underlying TwinCellSession HTTP session if it was opened.

client ¶

client(*, reuse: bool = True) -> Any

Return a bare DeepLifeClient.

target_id ¶

target_id(
    *,
    pdata_pert: AnnData | None = None,
    degs: list[str] | None = None,
    label: str | None = None,
    model_version: str | None = None,
    wait: bool = True,
    timeout_seconds: float = 10 * 60,
    poll_interval_seconds: float = 2.0,
    max_rows: int = DEFAULT_PREDICTION_MAX_ROWS,
    max_columns: int
    | None = DEFAULT_PREDICTION_MAX_COLUMNS,
    print_prediction_id: bool = True,
) -> Any

Submit split arms and DEGs, wait for the remote prediction, and return status.

Uses adata.X only; validation already ran at construction.

Parameters:

Name	Type	Description	Default
`pdata_pert`	`AnnData \| None`	Perturbed pseudo-bulk AnnData. Falls back to the instance attribute set at construction if not provided here.	`None`
`degs`	`list[str] \| None`	List of DEG gene symbols. Falls back to the instance attribute set at construction if not provided here.	`None`

build_differential_causal_graph ¶

build_differential_causal_graph(
    *,
    max_rows: int = DEFAULT_PREDICTION_MAX_ROWS,
    max_columns: int
    | None = DEFAULT_PREDICTION_MAX_COLUMNS,
    prediction_id: str | UUID | None = None,
) -> Any

Refresh prediction results and cache the influence preview matrix.

influence_matrix ¶

influence_matrix() -> Any

Return the cached influence preview matrix, fetching from the service if needed.

last_prediction ¶

last_prediction() -> PredictionStatusResponse | None

Last prediction payload held by the internal TwinCellSession.

use_prediction ¶

use_prediction(prediction_id: str | UUID) -> DataFrame

Select a completed prediction for subsequent getters on this instance.

After target_validation(), the session already points at that run. Use this to switch to another row from list_predictions(). Validates the id against the API and returns a one-row summary (same columns as list_predictions()).

Raises:

Type	Description
`NotFoundError`	If the prediction does not exist or is not accessible.

list_predictions ¶

list_predictions(
    *, limit: int = 50, cursor: str | None = None
) -> DataFrame

List past predictions (prediction_id, status, label, job_type, target).

Uses the same API key and base URL as this TwinCell instance. For the raw PredictionsListResponse, call TwinCellSession.list_predictions() on session().

Pagination: when the API returns next_cursor, it is stored in df.attrs["next_cursor"]; pass it as cursor= on the next call.

Example::

history = tc.list_predictions()
pid = history.iloc[0]["prediction_id"]
tc.use_prediction(pid)
tc.get_target_score(prediction_id=pid)

simulate ¶

simulate(
    targets: list[str] | None = None,
    *,
    prediction_id: str | UUID | None = None,
    max_rows: int = DEFAULT_PREDICTION_MAX_ROWS,
    max_columns: int
    | None = DEFAULT_PREDICTION_MAX_COLUMNS,
) -> dict[str, Any]

Merge prediction scores with rows whose gene symbol is in degs.

causal_analysis ¶

causal_analysis(
    *,
    target_id: PredictionStatusResponse | None = None,
    target: str,
    top_n_causal_degs: int = 1000,
    min_path_fraction: float = 0.1,
    min_path_probability: float = 0.0001,
    max_path_length: int | None = None,
    prediction_id: str | UUID | None = None,
    wait: bool = True,
    timeout_seconds: float = 10 * 60,
    poll_interval_seconds: float = 2.0,
    max_rows: int = DEFAULT_PREDICTION_MAX_ROWS,
    max_columns: int
    | None = DEFAULT_PREDICTION_MAX_COLUMNS,
) -> dict[str, Any]

Request a causal graph analysis for a protein target on a finished prediction.

Loads preview results locally to choose API parameters and to build a companion df_paths table. The remote job returns status and may include a graph image; use plot_causal_graph() to display it.

Normally call after target_id() for the same prediction. For the optional alternate mode, see target_validation().

Parameters:

Name	Type	Description	Default
`target_id`	`PredictionStatusResponse \| None`	Completed prediction handle. If omitted, the active session prediction is used.	`None`
`target`	`str`	Protein entity id, e.g. `BRAF\|PROTEIN`.	required
`top_n_causal_degs`	`int`	Caps ranked inputs used for `min_degs_fold_uniform` and for `df_paths`.	`1000`
`min_path_fraction`	`float`	Minimum path mass fraction per DEG (API default 0.1).	`0.1`
`min_path_probability`	`float`	Minimum raw path probability (API default 1e-4).	`0.0001`
`max_path_length`	`int \| None`	Optional maximum path length in nodes.	`None`

Returns:

Name	Type	Description
`dict`	`dict[str, Any]`	`df_paths` (local preview table), `causal` (alias) and
	`dict[str, Any]`	`causal_analysis` — terminal
	`dict[str, Any]`	class:`~deeplife.twincell.http.models.CausalAnalysisStatusResponse`
	`dict[str, Any]`	from `GET /v1/causal-analysis/{id}` (use `.artifacts` for all presigned
	`dict[str, Any]`	outputs), plus `target`, `prediction_id`, `top_degs`,
	`dict[str, Any]`	`min_degs_fold_uniform`, `top_n_causal_degs`.

target_validation ¶

target_validation(
    *,
    pdata_pert: AnnData | None = None,
    degs: list[str] | None = None,
    target_id: PredictionStatusResponse | None = None,
    target: str,
    label: str | None = None,
    top_n_causal_degs: int = 1000,
    deg_significance_fold: float = 1.0,
    min_path_fraction: float = 0.1,
    min_path_probability: float = 0.0001,
    max_path_length: int | None = None,
    prediction_id: str | UUID | None = None,
    wait: bool = True,
    timeout_seconds: float = 10 * 60,
    poll_interval_seconds: float = 2.0,
    max_rows: int = DEFAULT_PREDICTION_MAX_ROWS,
    max_columns: int
    | None = DEFAULT_PREDICTION_MAX_COLUMNS,
) -> UUID

Integrated target_validation run (split uploads via POST /v1/predictions).

Submits pdata_control, pdata_pert, and degs with job_type='target_validation'. The API creates a new prediction row and enqueues the prediction worker with job_type=target_validation (no separate causal-analysis row); this is not a follow-up on an existing inference prediction id.

When a local influence preview exists (see prediction_id / target_id below), fold parameters align with causal_analysis(). Otherwise defaults are used.

After completion, use get_target_score(), get_all_degs(), get_degs_impacted_by_target(), get_causal_paths(), get_intermediary_nodes(), and plot_causal_graph() for allowed outputs (external tiers do not expose presigned artifact URLs).

Parameters:

Name	Type	Description	Default
`pdata_pert`	`AnnData \| None`	Perturbed pseudo-bulk AnnData. Falls back to the instance attribute set at construction if not provided here.	`None`
`degs`	`list[str] \| None`	List of DEG gene symbols. Falls back to the instance attribute set at construction if not provided here.	`None`
`target_id`	`PredictionStatusResponse \| None`	Optional completed prediction handle used only to refresh local influence previews. Omit for standalone TV (no prior `target_id()`).	`None`
`target`	`str`	Protein entity id, e.g. `BRAF\|PROTEIN`.	required
`label`	`str \| None`	Optional run label stored on the prediction (visible in `list_predictions()`).	`None`
`top_n_causal_degs`	`int`	Same role as in `causal_analysis()` when preview data is available.	`1000`
`deg_significance_fold`	`float`	Fold for the embedded `target_id` step (default 1.0).	`1.0`
`min_path_fraction`	`float`	Minimum path mass fraction for the causal step.	`0.1`
`min_path_probability`	`float`	Minimum raw path probability for the causal step.	`0.0001`
`max_path_length`	`int \| None`	Optional maximum path length in nodes.	`None`

Returns:

Type	Description
`UUID`	uuid.UUID: `prediction_id` for this run. The validation `target` is
`UUID`	remembered for `get_target_score()` when `target` is omitted there.

get_target_score ¶

get_target_score(
    *,
    target: str | None = None,
    prediction_id: str | UUID | None = None,
    max_rows: int = DEFAULT_PREDICTION_MAX_ROWS,
    max_columns: int
    | None = DEFAULT_PREDICTION_MAX_COLUMNS,
    reload: bool = False,
) -> Any

Return the target-validation score row for target (single-row DataFrame).

Uses the latest active prediction when prediction_id is omitted. The default target is the one passed to the most recent target_validation().

External API responses expose only id, score, and percentage_degs_significant (via target_validation_score on GET). Internal users with full results still receive all score columns.

get_causal_paths ¶

get_causal_paths(
    *, prediction_id: str | UUID | None = None
) -> Any

Return causal path rows for impacted DEGs (DataFrame).

Uses GET /v1/predictions/{id}/causal-paths. The API returns redacted rows for DEGs from get_degs_impacted_by_target() (server-filtered).

Parameters:

Name	Type	Description	Default
`prediction_id`	`str \| UUID \| None`	Defaults to the latest `target_validation()` run.	`None`

get_all_degs ¶

get_all_degs(
    *, prediction_id: str | UUID | None = None
) -> list[str]

Return DEGs that mapped onto the interactome.

Uses GET /v1/predictions/{id}/mapped-degs.

Parameters:

Name	Type	Description	Default
`prediction_id`	`str \| UUID \| None`	Defaults to the latest `target_validation()` run.	`None`

get_degs_impacted_by_target ¶

get_degs_impacted_by_target(
    *, prediction_id: str | UUID | None = None
) -> list[str]

Return DEGs whose influence score on the prediction's target exceeds the worker threshold.

Uses GET /v1/predictions/{id}/degs-impacted. The target is always taken from the prediction record (the target submitted with that prediction_id).

Parameters:

Name	Type	Description	Default
`prediction_id`	`str \| UUID \| None`	Defaults to the latest `target_validation()` run.	`None`

get_intermediary_nodes ¶

get_intermediary_nodes(
    *,
    prediction_id: str | UUID | None = None,
    top_n_degs: int,
    deg: str | None = None,
) -> list[str]

Return unique gene symbols from all nodes on causal paths (Enrichr lists).

Uses GET /v1/predictions/{id}/intermediary-proteins. top_n_degs uses the same server-side filter as plot_causal_graph().

Parameters:

Name	Type	Description	Default
`prediction_id`	`str \| UUID \| None`	Defaults to the latest `target_validation()` run.	`None`
`top_n_degs`	`int`	Top DEGs by `score_deg_given_target` (server-side filter).	required
`deg`	`str \| None`	Optional DEG filter (e.g. `"NDRG4\|RNA"`); omit for all top DEGs.	`None`

get_intermediary_proteins ¶

get_intermediary_proteins(
    *,
    prediction_id: str | UUID | None = None,
    top_n_degs: int,
    deg: str | None = None,
) -> list[str]

Alias for get_intermediary_nodes() (API route name retained).

plot_causal_graph ¶

plot_causal_graph(
    *,
    top_n_degs: int,
    prediction_id: str | UUID | None = None,
    dpi: int | None = None,
    figsize: tuple[float, float] | None = None,
    display_dpi: float | None = None,
) -> Any

Show the causal graph via GET .../causal-graph (server Parquet replay).

Use after target_validation(). top_n_degs is required (TwinCell plot_causal_graph(..., top_n_causal_degs=...) semantics).

Parameters:

Name	Type	Description	Default
`top_n_degs`	`int`	Top DEGs by `score_deg_given_target` (server-side filter).	required
`prediction_id`	`str \| UUID \| None`	Defaults to the latest `target_validation()` run.	`None`
`dpi`	`int \| None`	PNG render resolution on the API (default 200, TwinCell parity).	`None`
`figsize`	`tuple[float, float] \| None`	Matplotlib figure size in inches for notebook display. When omitted, width follows the API PNG aspect ratio (TwinCell layout).	`None`
`display_dpi`	`float \| None`	Matplotlib figure DPI for display only (default 100).	`None`

The figure displays once in Jupyter when this call is the last line in a cell.

extract_causal_subgraph ¶

extract_causal_subgraph(
    simulation: Mapping[str, Any],
    *,
    target: str | None = None,
    identification_result: TargetIdentificationResult
    | None = None,
    n_degs: int | None = None,
    significance_threshold: float | None = None,
    prediction_id: str | UUID | None = None,
    wait: bool = True,
    timeout_seconds: float = 10 * 60,
    poll_interval_seconds: float = 2.0,
) -> dict[str, Any]

Pick a focal protein target and run causal_analysis()-style remote analysis.

display ¶

display(subgraph: Mapping[str, Any]) -> Any

Show a causal subgraph PNG from extract_causal_subgraph().

path_analysis ¶

path_analysis(
    context: Mapping[str, Any], *, deg: str, target: str
) -> dict[str, Any]

Summarise the influence score between one RNA token and one protein target.

filter_predictions_by_degs ¶

filter_predictions_by_degs(
    *, degs: list[str]
) -> TargetIdentificationResult

Filter prediction rows whose gene token intersects degs.

as_dict ¶

as_dict() -> dict[str, Any]

Summary fields useful for logging or UI.

TwinCell¶

Data requirements¶

Reading data¶

read_h5ad ¶

TwinCell study¶

TwinCell ¶

prediction_id property ¶

session property ¶

close ¶

client ¶

target_id ¶

build_differential_causal_graph ¶

influence_matrix ¶

last_prediction ¶

use_prediction ¶

list_predictions ¶

simulate ¶

causal_analysis ¶

target_validation ¶

get_target_score ¶

get_causal_paths ¶

get_all_degs ¶

get_degs_impacted_by_target ¶

get_intermediary_nodes ¶

get_intermediary_proteins ¶

plot_causal_graph ¶

extract_causal_subgraph ¶

display ¶

path_analysis ¶

filter_predictions_by_degs ¶

as_dict ¶

prediction_id `property` ¶

session `property` ¶