Usage#
Using Iggyptop#
The iggytop package is not yet published so if it was not installed automatically when running uv sync you can manually install it with
pip install .
to get started right away, try running
uv run create_knowledge_graph.py
which will create a knowledge graph with all available databases, convert it to AIRR format and save it to a json file.
To use just a subset of the available databases, use the adapters_to_include parameter of io.create_knowledge_graph() (see below).
In Detail#
This section goes through all the steps involved in calling the io.create_knowledge_graph() function. this is in priciple all create_knowledge_graph.py does. It also covers the most important components used.
Imports#
Iggytop is built on top of BioCypher, therefore the BioCypher package as well as the iggytop package must be installed in the active python environment. Each adapter in Iggytop is defined in a separate class (e.g. VDJDBAdapter, using Base adapter as base class).
Parameters#
Please refer to the API section in the documentation to see and understand the available parameters for io.create_knowledge_graph(). These can be used to change the cache dir, change the scope of datasets being integrated as well as to set the output format.
Biocypher Knowledge Graph#
A new BioCypher instance is initialized using the config/biocypher_config.yaml which contains the parameters needed, as well as config/schema_config.yaml which defines the ontology used for the Iggytop graph.
Adapters#
Wen an instance of any adapter class is created, it will initialize by downloading the data from the source database (if not in cache). The data in table format is then converted (still to table format) in order to match the iggytop requirements:
Each row represents a tcr-epitope pair
Missing values are
NoneThe column names are converted to a standardized set
Only the columns with standardised names are kept
The Amino acid sequences and gene names are
harmonizedandEpitopes are
labeled by their IRIif possible. This is done using the IEDB Database API
Generation of the Knowledge Graph#
For each adapter in the list
The nodes are generated by the
Base adapter_generate_nodes_from_table()method.The edges are generated by the
Base adapter_generate_edges_from_table()method.
And added to the BioCypher Graph. Please refer to the BioCypher documentation to understand the roles of the translator and deduplicator. The node and edge label generation (implying uniqueness) is described here.
Output#
The graph format is defined by the config/biocypher_config.yaml. This defines what get_kg() will return. For Iggytop the AIRR format was added to the compatible formats.
Note, the conversion from the Knowledge graph to AIRR Cell data (tabular) is non-trivial and must be understood well in order to use the resulting data for downstream applications. Most importantly, this is NOT a concatenation of the underlying datasets. therefore this is currently a work in progress.
Additional Examples#
Please check out the tutorials for more use cases.