Usage#

Using Iggyptop#

The iggytop package is not yet published so if it was not installed automatically when running uv sync you can manually install it with

pip install .

to get started right away, try running

uv run create_knowledge_graph.py

which will create a knowledge graph with all available databases, convert it to AIRR format and save it to a json file.

To use just a subset of the available databases, use the adapters_to_include parameter of io.create_knowledge_graph() (see below).

In Detail#

This section goes through all the steps involved in calling the io.create_knowledge_graph() function. this is in priciple all create_knowledge_graph.py does. It also covers the most important components used.

Imports#

Iggytop is built on top of BioCypher, therefore the BioCypher package as well as the iggytop package must be installed in the active python environment. Each adapter in Iggytop is defined in a separate class (e.g. VDJDBAdapter, using Base adapter as base class).

Parameters#

Please refer to the API section in the documentation to see and understand the available parameters for io.create_knowledge_graph(). These can be used to change the cache dir, change the scope of datasets being integrated as well as to set the output format.

Biocypher Knowledge Graph#

A new BioCypher instance is initialized using the config/biocypher_config.yaml which contains the parameters needed, as well as config/schema_config.yaml which defines the ontology used for the Iggytop graph.

Adapters#

Wen an instance of any adapter class is created, it will initialize by downloading the data from the source database (if not in cache). The data in table format is then converted (still to table format) in order to match the iggytop requirements:

Generation of the Knowledge Graph#

For each adapter in the list

  • The nodes are generated by the Base adapter _generate_nodes_from_table() method.

  • The edges are generated by the Base adapter _generate_edges_from_table() method.

And added to the BioCypher Graph. Please refer to the BioCypher documentation to understand the roles of the translator and deduplicator. The node and edge label generation (implying uniqueness) is described here.

Output#

The graph format is defined by the config/biocypher_config.yaml. This defines what get_kg() will return. For Iggytop the AIRR format was added to the compatible formats.

Note, the conversion from the Knowledge graph to AIRR Cell data (tabular) is non-trivial and must be understood well in order to use the resulting data for downstream applications. Most importantly, this is NOT a concatenation of the underlying datasets. therefore this is currently a work in progress.

Additional Examples#

Please check out the tutorials for more use cases.