iggytop.adapters.base_adapter.BaseAdapter#

class iggytop.adapters.base_adapter.BaseAdapter(bc, cache_dir=None, receptors_to_include=('TCR', 'BCR'), test=False, filter_10x=False)#

Base class for all adapters.

This class is responsible for the basic structure and function for Iggytop adapters. It initializes any adapter by calling the corresponding function for downloading and reading the data from the source. It also provides methods for generating BioCypher nodes and edges from the data.

Variables:
  • table (pd.DataFrame) – The data table read from the source.

  • DB_NAME (str) – Name of the database. Must be defined in subclasses.

  • available_receptors (list[str]) – List of receptor types available in the database. Must be defined in subclasses.

Parameters:
  • bc (BioCypher) – An instance of the BioCypher class.

  • cache_dir (str | None) – Directory to cache data. Defaults to None.

  • receptors_to_include (Optional[Sequence[Literal['TCR', 'BCR']]]) – Receptors to include. Defaults to (“TCR”, “BCR”).

  • test (bool) – Whether to run in test mode. Defaults to False.

  • filter_10x (bool) – Whether to filter out 10X Genomics datasets. Defaults to False.

__init__(bc, cache_dir=None, receptors_to_include=('TCR', 'BCR'), test=False, filter_10x=False)#

Initializes the BaseAdapter instance.

Parameters:
  • bc (BioCypher) – An instance of the BioCypher class.

  • cache_dir (str | None) – Directory to cache data. Defaults to None.

  • receptors_to_include (Optional[Sequence[Literal['TCR', 'BCR']]]) – Receptors to include. Defaults to (“TCR”, “BCR”).

  • test (bool) – Whether to run in test mode. Defaults to False.

  • filter_10x (bool) – Whether to filter out 10X Genomics datasets. Defaults to False.

Methods

__init__(bc[, cache_dir, ...])

Initializes the BaseAdapter instance.

create_anndata()

Creates an Anndata object from the AIRR cell data and saves it to a file in the cache directory.

get_edges()

Abstract method to generate BioCypher edges from the data.

get_latest_release(bc)

Abstract method to get the latest release of the data.

get_nodes()

Abstract method to generate BioCypher nodes from the data.

read_table(bc, table_path, receptors[, test])

Abstract method to read and harmonize the data table from the source.

set_metadata([version, source_url, ...])

Sets the metadata for the adapter.

Attributes

airr_cells

Property to get the list of AIRR cells.

cache_dir

Property to get the cache directory.

db_name

Property to get the database name.

metadata

Property to get the adapter metadata.

receptors

Property to get the available receptor types.

table

Property to get the data table.

DB_NAME

available_receptors

_generate_edges_from_table(source_subset_cols, target_subset_cols, source_unique_cols=None, target_unique_cols=None)#

Generates BioCypher edges from the data table.

The unique_cols are used for selecting the rows which contain relevant information. They do NOT correspond to the unique identifier. To create the unique identifier, we use unique_cols + V gene (if available) for TCR chains.

Parameters:
  • source_subset_cols (list[str]) – List of columns for the source node.

  • target_subset_cols (list[str]) – List of columns for the target node.

  • source_unique_cols (list[str] | None) – List of unique columns for the source node. Defaults to None.

  • target_unique_cols (list[str] | None) – List of unique columns for the target node. Defaults to None.

Yields:

A tuple containing the edge ID, source ID, target ID, edge type, and properties.

_generate_nodes_from_table(subset_cols, unique_cols=None, property_cols=None)#

Generates BioCypher nodes from the data table.

The unique_cols are used for selecting the rows which contain relevant information. They do NOT correspond to the unique identifier. To create the unique identifier, we use unique_cols + V gene (if available) for TCR chains.

Parameters:
  • subset_cols (list[str]) – List of columns to subset the table.

  • unique_cols (list[str] | None) – List of columns to check for uniqueness. Defaults to None.

  • property_cols (list[str] | None) – List of columns to include as properties. Defaults to None.

Yields:

A tuple containing the node ID, node type, and properties.

create_anndata()#

Creates an Anndata object from the AIRR cell data and saves it to a file in the cache directory.

Return type:

None

abstractmethod get_edges()#

Abstract method to generate BioCypher edges from the data.

This method is intended to call _generate_edges_from_table with the right parameters for each edge type. This requires parameters depending on the adapter used.

Yields:

tuple – A BioCypher edge (id, source, target, type, properties).

abstractmethod get_latest_release(bc)#

Abstract method to get the latest release of the data.

Parameters:

bc (BioCypher) – An instance of the BioCypher class.

Return type:

str | tuple[str, ...]

Returns:

Path to the latest release file(s).

abstractmethod get_nodes()#

Abstract method to generate BioCypher nodes from the data.

This method is intended to use _generate_nodes_from_table with the right parameters for each edge type. This requires parameters depending on the adapter used.

Yields:

tuple – A BioCypher node (id, type, properties).

abstractmethod read_table(bc, table_path, receptors, test=False)#

Abstract method to read and harmonize the data table from the source.

Parameters:
  • bc (BioCypher) – An instance of the BioCypher class.

  • table_path (str | tuple[str, ...]) – Path to the data table file(s).

  • receptors (list[str]) – List of receptor types to include in the table.

  • test (bool) – Whether to run in test mode. Defaults to False.

Return type:

DataFrame

Returns:

The data table.

set_metadata(version=None, source_url=None, previous_version=None)#

Sets the metadata for the adapter.

Parameters:
  • version (str) – The version of the database. Defaults to None.

  • source_url (str) – The URL of the source. Defaults to None.

  • previous_version (str) – The version of the database in the previous release. Defaults to None.

DB_NAME: str#
_abc_impl = <_abc._abc_data object>#
property airr_cells: list[AirrCell] | None#

Property to get the list of AIRR cells.

Returns:

The list of AIRR cells.

available_receptors: list[str]#
property cache_dir: str#

Property to get the cache directory.

Returns:

The cache directory.

property db_name: str#

Property to get the database name.

Returns:

The database name.

property metadata: dict[str, Any]#

Property to get the adapter metadata.

Returns:

The metadata dictionary.

property receptors: list[str]#

Property to get the available receptor types.

Returns:

List of receptor types.

property table: DataFrame#

Property to get the data table. Reads the table if not already read.

Returns:

The data table.