iggytop.adapters.vdjdb_adapter.VDJDBAdapter#
- class iggytop.adapters.vdjdb_adapter.VDJDBAdapter(bc, cache_dir=None, receptors_to_include=('TCR', 'BCR'), test=False, filter_10x=False)#
BioCypher adapter for the VDJDB database.
This adapter handles the downloading, reading, and processing of the VDJDB database.
- __init__(bc, cache_dir=None, receptors_to_include=('TCR', 'BCR'), test=False, filter_10x=False)#
Initializes the BaseAdapter instance.
- Parameters:
bc (
BioCypher) – An instance of the BioCypher class.cache_dir (
str|None) – Directory to cache data. Defaults to None.receptors_to_include (
Optional[Sequence[Literal['TCR','BCR']]]) – Receptors to include. Defaults to (“TCR”, “BCR”).test (
bool) – Whether to run in test mode. Defaults to False.filter_10x (
bool) – Whether to filter out 10X Genomics datasets. Defaults to False.
Methods
__init__(bc[, cache_dir, ...])Initializes the BaseAdapter instance.
create_anndata()Creates an Anndata object from the AIRR cell data and saves it to a file in the cache directory.
Generates edges for the VDJdb data.
Retrieves the latest release of the VDJDB database from GitHub.
Generates nodes for the VDJdb data.
read_table(bc, table_path, receptors[, test])Reads and processes the VDJdb table from the downloaded database file.
set_metadata([version, source_url, ...])Sets the metadata for the adapter.
Attributes
Directory name for the downloaded database.
File name of the database.
Name of the database.
GitHub repository name for the VDJDB database.
airr_cellsProperty to get the list of AIRR cells.
Receptor types available in VDJDB.
cache_dirProperty to get the cache directory.
db_nameProperty to get the database name.
metadataProperty to get the adapter metadata.
receptorsProperty to get the available receptor types.
tableProperty to get the data table.
- _process_single_chain(df, chain_type)#
Process single chain data (TRA or TRB only).
- Parameters:
df – The input DataFrame containing the single chain data.
chain_type – The type of chain, either “tra” or “trb”.
- Returns:
A DataFrame with the processed single chain data.
- _transform_paired_data_efficient(df)#
Efficient transformation that handles ALL cases correctly. This is required to properly pair TRA and TRB chains based on complex.id. (They are on different rows in the raw database) :type df: :param df: The input DataFrame containing the VDJdb data.
- Returns:
A DataFrame with the transformed paired data.
- get_edges()#
Generates edges for the VDJdb data.
This method yields edge data for chain 1 to chain 2, chain 1 to epitope, and chain 2 to epitope.
- get_latest_release(bc)#
Retrieves the latest release of the VDJDB database from GitHub.
- Parameters:
bc (
BioCypher) – An instance of the BioCypher class.- Return type:
- Returns:
The file path of the downloaded database.
- Raises:
FileNotFoundError – If the database file cannot be found after downloading.
- get_nodes()#
Generates nodes for the VDJdb data.
This method yields node data for chain 1, chain 2, and epitopes.
- read_table(bc, table_path, receptors, test=False)#
Reads and processes the VDJdb table from the downloaded database file.
- Parameters:
- Return type:
- Returns:
A DataFrame containing the processed table data.
- Raises:
FileNotFoundError – If the table file cannot be found.
- DB_DIR = 'vdjdb_latest'#
Directory name for the downloaded database.
- DB_FNAME = 'vdjdb.txt'#
File name of the database.
- DB_NAME: str = 'VDJDB'#
Name of the database.
- REPO_NAME = 'antigenomics/vdjdb-db'#
GitHub repository name for the VDJDB database.
- _abc_impl = <_abc._abc_data object>#
- available_receptors: list[str] = ['TCR']#
Receptor types available in VDJDB.