2.1.1.2.1.3. enkie.dbs.uniprot
Methods querying protein data from Uniprot.
2.1.1.2.1.3.1. Module Contents
- enkie.dbs.uniprot.FAMILY_LEVELS = ['superfamily', 'family', 'subfamily', 'subsubfamily', 'other_families'][source]
- enkie.dbs.uniprot.join_protein_ids(ids: Iterable[str]) str [source]
Join multiple protein identifiers in a single string.
- enkie.dbs.uniprot.clean_and_sort_protein_ids(ids: str) str [source]
Standardize the format of a string containing multiple protein identifiers.
- enkie.dbs.uniprot.query_protein_data(protein_ids: List[str], columns: List[str]) pandas.DataFrame [source]
Query data from Uniprot for the given proteins.
- enkie.dbs.uniprot.parse_family_string(families_string: str) Tuple[str, str, str, str, str] [source]
Extract structured protein family information from a Uniprot protein family annotation.
- Parameters:
families_string (str) – The uniprot family annotation.
- Returns:
The extracted family information, structured as (superfamily, family, subfamily, subsubfamily, other_families).
- Return type:
- Raises:
ValueError – If the input string does not have the expected format.
- enkie.dbs.uniprot.parse_family_df(annotations: pandas.DataFrame) pandas.DataFrame [source]
Extract structured protein family information from a DataFrame of of Uniprot family annotations.
- Parameters:
annotations (pd.DataFrame) – The Uniprot family annotations.
- Returns:
The extracted family information, structured as (superfamily, family, subfamily, subsubfamily, other_families).
- Return type:
pd.DataFrame