iggytop.adapters.utils#
This module contains utility functions for harmonizing data for iggytop
Functions
|
Helper function to aggregate unique values into a joined string. |
|
Deduplicates AnnData based on subset_cols and aggregates values in agg_cols. |
|
Calculates the SHA256 checksum of a file. |
|
Retrieve IEDB IDs for multiple epitopes using batched requests. |
|
Find MHC class information from the MHC gene (allele) names. |
|
Retrieve PubMed IDs for multiple IEDB reference IDs using batched requests. |
|
Fetches metadata from the latest GitHub release of iggytop. |
|
Standardize tissue source information while staying close to the original values. |
|
Preprocesses CDR3 sequences, epitope sequences, and gene names in a harmonized way. |
|
Normalize table values to strings or None only. |
|
Convert list of AirrCell objects to CSV format and save as compressed file. |
|
Save a list of AirrCell objects to a compressed JSON file with auto-generated filename. |
- iggytop.adapters.utils._get_epitope_data(bc, epitopes, base_url, match_type='exact')#
Get epitope data.
- Parameters:
- Return type:
- Returns:
List of epitope data dictionaries
- iggytop.adapters.utils._get_reference_data(bc, reference_ids, base_url)#
Get reference data for PubMed ID mapping.
- iggytop.adapters.utils._is_ig_locus(locus)#
Return True when a chain locus corresponds to BCR/IG chains.
- Return type:
- iggytop.adapters.utils._process_cdr3_with_j_gene(cdr3, species, j_symbol, is_igh)#
Standardize CDR3 with tidytcells, but tolerate malformed J symbols.
- iggytop.adapters.utils._process_epitope_sequence(seq)#
Remove flanking residues in epitope sequences.
- iggytop.adapters.utils._set_up_config(output_format, cache_dir)#
- iggytop.adapters.utils._set_up_schema(cache_dir)#
- iggytop.adapters.utils.aggregate_unique_joined(series, separator='|')#
Helper function to aggregate unique values into a joined string. Warns if string ‘nan’ are found.
- iggytop.adapters.utils.deduplicate_and_aggregate(adata, subset_cols, agg_cols, separator='|')#
Deduplicates AnnData based on subset_cols and aggregates values in agg_cols. Uses scirpy airr_context to access TCR-specific columns if needed.
- iggytop.adapters.utils.get_file_checksum(file_path)#
Calculates the SHA256 checksum of a file.
- iggytop.adapters.utils.get_iedb_ids_batch(bc, epitopes, chunk_size=150)#
Retrieve IEDB IDs for multiple epitopes using batched requests.
First tries exact matches, then falls back to substring matches for unmatched epitopes.
- Parameters:
- Return type:
- Returns:
Dictionary mapping epitope sequences to their IEDB IDs (0 if not found)
- iggytop.adapters.utils.get_mhc_class(allele)#
Find MHC class information from the MHC gene (allele) names.
- iggytop.adapters.utils.get_pmids_batch(bc, reference_urls, chunk_size=150)#
Retrieve PubMed IDs for multiple IEDB reference IDs using batched requests.
- Parameters:
- Return type:
- Returns:
Dictionary mapping IEDB reference IDs to their PubMed IDs (None if not found)
- iggytop.adapters.utils.get_previous_release_metadata(repo_name='iggytop/iggytop')#
Fetches metadata from the latest GitHub release of iggytop.
- iggytop.adapters.utils.get_tissue_source(tissue)#
Standardize tissue source information while staying close to the original values. Could be improved
- iggytop.adapters.utils.harmonize_sequences(bc, table)#
Preprocesses CDR3 sequences, epitope sequences, and gene names in a harmonized way. The following steps are performed: 1. Clean epitope sequences (remove flanking residues) 2. Add IEDB IRI and corresponding antigen information (species and antigen name) where missing 3. Harmonize species terms for antigen species and receptor chain species 4. Normalize VDJ-gene names to IMGT standards 5. Clean CDR3 sequences (normalizes junction_aas) 6. Convert MHC gene names to IMGT (for human)
- Return type:
- iggytop.adapters.utils.normalize_table_strings(table)#
Normalize table values to strings or None only.
- Return type:
- iggytop.adapters.utils.save_airr_cells_csv(airr_cells, directory)#
Convert list of AirrCell objects to CSV format and save as compressed file.
- iggytop.adapters.utils.save_airr_cells_json(airrcells, directory, filename=None, metadata=None)#
Save a list of AirrCell objects to a compressed JSON file with auto-generated filename.