iggytop.adapters.vdjdb_adapter.VDJDBAdapter#

class iggytop.adapters.vdjdb_adapter.VDJDBAdapter(bc, cache_dir=None, receptors_to_include=('TCR', 'BCR'), test=False, filter_10x=False)#

BioCypher adapter for the VDJDB database.

This adapter handles the downloading, reading, and processing of the VDJDB database.

__init__(bc, cache_dir=None, receptors_to_include=('TCR', 'BCR'), test=False, filter_10x=False)#

Initializes the BaseAdapter instance.

Parameters:
  • bc (BioCypher) – An instance of the BioCypher class.

  • cache_dir (str | None) – Directory to cache data. Defaults to None.

  • receptors_to_include (Optional[Sequence[Literal['TCR', 'BCR']]]) – Receptors to include. Defaults to (“TCR”, “BCR”).

  • test (bool) – Whether to run in test mode. Defaults to False.

  • filter_10x (bool) – Whether to filter out 10X Genomics datasets. Defaults to False.

Methods

__init__(bc[, cache_dir, ...])

Initializes the BaseAdapter instance.

create_anndata()

Creates an Anndata object from the AIRR cell data and saves it to a file in the cache directory.

get_edges()

Generates edges for the VDJdb data.

get_latest_release(bc)

Retrieves the latest release of the VDJDB database from GitHub.

get_nodes()

Generates nodes for the VDJdb data.

read_table(bc, table_path, receptors[, test])

Reads and processes the VDJdb table from the downloaded database file.

set_metadata([version, source_url, ...])

Sets the metadata for the adapter.

Attributes

DB_DIR

Directory name for the downloaded database.

DB_FNAME

File name of the database.

DB_NAME

Name of the database.

REPO_NAME

GitHub repository name for the VDJDB database.

airr_cells

Property to get the list of AIRR cells.

available_receptors

Receptor types available in VDJDB.

cache_dir

Property to get the cache directory.

db_name

Property to get the database name.

metadata

Property to get the adapter metadata.

receptors

Property to get the available receptor types.

table

Property to get the data table.

_process_single_chain(df, chain_type)#

Process single chain data (TRA or TRB only).

Parameters:
  • df – The input DataFrame containing the single chain data.

  • chain_type – The type of chain, either “tra” or “trb”.

Returns:

A DataFrame with the processed single chain data.

_transform_paired_data_efficient(df)#

Efficient transformation that handles ALL cases correctly. This is required to properly pair TRA and TRB chains based on complex.id. (They are on different rows in the raw database) :type df: :param df: The input DataFrame containing the VDJdb data.

Returns:

A DataFrame with the transformed paired data.

get_edges()#

Generates edges for the VDJdb data.

This method yields edge data for chain 1 to chain 2, chain 1 to epitope, and chain 2 to epitope.

get_latest_release(bc)#

Retrieves the latest release of the VDJDB database from GitHub.

Parameters:

bc (BioCypher) – An instance of the BioCypher class.

Return type:

str

Returns:

The file path of the downloaded database.

Raises:

FileNotFoundError – If the database file cannot be found after downloading.

get_nodes()#

Generates nodes for the VDJdb data.

This method yields node data for chain 1, chain 2, and epitopes.

read_table(bc, table_path, receptors, test=False)#

Reads and processes the VDJdb table from the downloaded database file.

Parameters:
  • bc (BioCypher) – An instance of the BioCypher class.

  • table_path (str) – Path to the table file.

  • receptors (list[str]) – List of receptor types to include in the table (Ignored as only TCR is available).

  • test (bool) – If True, loads only a subset of the data for testing (default is False).

Return type:

DataFrame

Returns:

A DataFrame containing the processed table data.

Raises:

FileNotFoundError – If the table file cannot be found.

DB_DIR = 'vdjdb_latest'#

Directory name for the downloaded database.

DB_FNAME = 'vdjdb.txt'#

File name of the database.

DB_NAME: str = 'VDJDB'#

Name of the database.

REPO_NAME = 'antigenomics/vdjdb-db'#

GitHub repository name for the VDJDB database.

_abc_impl = <_abc._abc_data object>#
available_receptors: list[str] = ['TCR']#

Receptor types available in VDJDB.