# Usage

## Using Iggyptop

The iggytop package is not yet published so if it was not installed automatically when running `uv sync` you can manually install it with

`pip install .`

to get started right away, try running

`uv run create_knowledge_graph.py`

which will create a knowledge graph with all available databases, convert it to AIRR format and save it to a json file.

To use just a subset of the available databases, use the `adapters_to_include` parameter of {meth}`io.create_knowledge_graph()<iggytop.io.create_knowledge_graph>` (see below).

### In Detail

This section goes through all the steps involved in calling the {meth}`io.create_knowledge_graph()<iggytop.io.create_knowledge_graph>` function. this is in priciple all [create_knowledge_graph.py](https://github.com/biocypher/iggytop/blob/main/create_knowledge_graph.py) does. It also covers the most important components used.

#### Imports
Iggytop is built on top of BioCypher, therefore the BioCypher package as well as the iggytop package must be installed in the active python environment. Each adapter in Iggytop is defined in a separate class (e.g. {class}`VDJDBAdapter<iggytop.adapters.vdjdb_adapter.VDJDBAdapter>`, using {class}`Base adapter <iggytop.adapters.base_adapter.BaseAdapter>` as base class).

#### Parameters
Please refer to the API section in the documentation to see and understand the available parameters for {meth}`io.create_knowledge_graph()<iggytop.io.create_knowledge_graph>`. These can be used to change the cache dir, change the scope of datasets being integrated as well as to set the output format.

#### Biocypher Knowledge Graph
A new BioCypher instance is initialized using the `config/biocypher_config.yaml` which contains the parameters needed, as well as `config/schema_config.yaml` which defines the [ontology](ontology) used for the Iggytop graph.

#### Adapters
Wen an instance of any adapter class is created, it will initialize by {meth}`downloading <iggytop.adapters.base_adapter.BaseAdapter.get_latest_release>` the data from the source database (if not in cache). The data in table format is then converted (still to table format) in order to match the iggytop requirements:
- Each row represents a tcr-epitope pair
- Missing values are `None`
- The column names are converted to a [standardized set ](https://github.com/biocypher/iggytop/blob/main/src/iggytop/adapters/constants.py)
- Only the columns with standardised names are kept
- The Amino acid sequences and gene names are {meth}`harmonized<iggytop.adapters.utils.harmonize_sequences>` and
- Epitopes are {meth}`labeled by their IRI<iggytop.adapters.utils.get_iedb_ids_batch>` if possible. This is done using the [IEDB Database API](https://help.iedb.org/hc/en-us/articles/4402872882189-Immune-Epitope-Database-Query-API-IQ-API)

#### Generation of the Knowledge Graph
For each adapter in the list
- The nodes are generated by the {class}`Base adapter <iggytop.adapters.base_adapter.BaseAdapter>` {meth}`~_generate_nodes_from_table()` method.
- The edges are generated by the {class}`Base adapter <iggytop.adapters.base_adapter.BaseAdapter>` {meth}`~_generate_edges_from_table()` method.

And added to the BioCypher Graph. Please refer to the [BioCypher documentation](https://biocypher.org/) to understand the roles of the translator and deduplicator.
The node and edge label generation (implying uniqueness) is described [here](uniquenes).

#### Output
The graph format is defined by the `config/biocypher_config.yaml`. This defines what {meth}`~bc.get_kg()` will return. For Iggytop the AIRR format was added to the compatible formats.

Note, the conversion from the Knowledge graph to AIRR Cell data (tabular) is non-trivial and must be understood well in order to use the resulting data for downstream applications. Most importantly, this is NOT a concatenation of the underlying datasets. therefore this is currently a work in progress.


## Additional Examples
Please check out the tutorials for more use cases.