2.1.1.2.1.3. enkie.dbs.uniprot

Methods querying protein data from Uniprot.

2.1.1.2.1.3.1. Module Contents

enkie.dbs.uniprot.FAMILY_LEVELS = ['superfamily', 'family', 'subfamily', 'subsubfamily', 'other_families'][source]
enkie.dbs.uniprot.join_protein_ids(ids: Iterable[str]) str[source]

Join multiple protein identifiers in a single string.

Parameters:

ids (Iterable[str]) – The input identifiers.

Returns:

A string containing the input identifiers in standardized form.

Return type:

str

enkie.dbs.uniprot.clean_and_sort_protein_ids(ids: str) str[source]

Standardize the format of a string containing multiple protein identifiers.

Parameters:

ids (str) – The input string.

Returns:

A string containing the input identifiers in standardized form.

Return type:

str

enkie.dbs.uniprot.query_protein_data(protein_ids: List[str], columns: List[str]) pandas.DataFrame[source]

Query data from Uniprot for the given proteins.

Parameters:
  • protein_ids (List[str]) – The query Uniprot identifiers.

  • columns (List[str]) – The data columns to return.

Returns:

The requested protein data.

Return type:

pd.DataFrame

enkie.dbs.uniprot.parse_family_string(families_string: str) Tuple[str, str, str, str, str][source]

Extract structured protein family information from a Uniprot protein family annotation.

Parameters:

families_string (str) – The uniprot family annotation.

Returns:

The extracted family information, structured as (superfamily, family, subfamily, subsubfamily, other_families).

Return type:

Tuple[str, str, str, str, str]

Raises:

ValueError – If the input string does not have the expected format.

enkie.dbs.uniprot.parse_family_df(annotations: pandas.DataFrame) pandas.DataFrame[source]

Extract structured protein family information from a DataFrame of of Uniprot family annotations.

Parameters:

annotations (pd.DataFrame) – The Uniprot family annotations.

Returns:

The extracted family information, structured as (superfamily, family, subfamily, subsubfamily, other_families).

Return type:

pd.DataFrame

enkie.dbs.uniprot.combine_family_names(families_df: pandas.DataFrame, level: str) str[source]

Combine structured protein family information in a single string.

Parameters:
  • families_df (pd.DataFrame) – The input structured family information.

  • level (str) – The level (one of superfamily, family, subfamily, subsubfamily, other_families) at which information should be combined.

Returns:

The combined family information.

Return type:

str