Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 23 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The `result` is a pandas DataFrame containing the mapped IDs (see below), while

## Retrieving Information

All [supported return fields](https://david-araripe.github.io/UniProtMapper/stable/field_reference.html#supported-fields) are both accessible through the attribute `ProtMapper.fields_table`:
A DataFrame with the [supported return fields](https://david-araripe.github.io/UniProtMapper/stable/field_reference.html#supported-fields) is accessible through the attribute `ProtMapper.fields_table`:

```Python
from UniProtMapper import ProtMapper
Expand All @@ -86,11 +86,11 @@ df.head()
```
| | label | returned_field | field_type | has_full_version | type |
|---:|:---------------------|:-----------------|:-----------------|:-------------------|:--------------|
| 0 | Entry | accession | Names & Taxonomy | yes | uniprot_field |
| 1 | Entry Name | id | Names & Taxonomy | yes | uniprot_field |
| 2 | Gene Names | gene_names | Names & Taxonomy | yes | uniprot_field |
| 3 | Gene Names (primary) | gene_primary | Names & Taxonomy | yes | uniprot_field |
| 4 | Gene Names (synonym) | gene_synonym | Names & Taxonomy | yes | uniprot_field |
| 0 | Entry | accession | Names & Taxonomy | - | uniprot_field |
| 1 | Entry Name | id | Names & Taxonomy | - | uniprot_field |
| 2 | Gene Names | gene_names | Names & Taxonomy | - | uniprot_field |
| 3 | Gene Names (primary) | gene_primary | Names & Taxonomy | - | uniprot_field |
| 4 | Gene Names (synonym) | gene_synonym | Names & Taxonomy | - | uniprot_field |

From the DataFrame, all `return_field` entries can be used to access UniProt data programmatically:

Expand All @@ -105,6 +105,23 @@ result, failed = mapper.get(["Q02880"], fields=fields)
>>> Fetched: 1 / 1
```

Further, for the cross-referenced fields that have `has_full_version` set to `yes`, returning the same field with extra information is supported by passing `<field_name>_full`, such as `xref_pdb_full`.

All available return fields are also accessible through the attribute `ProtMapper.supported_return_fields`:

```python
from UniProtMapper import ProtMapper
mapper = ProtMapper()
print(mapper.supported_return_fields)

>>> ['accession',
>>> 'id',
>>> 'gene_names',
>>> ...
>>> 'xref_smart_full',
>>> 'xref_supfam_full']
```

## Field-based Querying

UniProtMapper supports complex field-based protein queries using boolean operators (AND, OR, NOT) through the `uniprotkb_fields` module. This allows you to create sophisticated searches combining multiple criteria. For example:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/field_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ The supported return fields are listed below. The columns contain different info
- **label**: The label used by UniProt to represent this field. Also used as column names on the `pd.DataFrame` returned from `get` methods implemented on both APIs.
- **returned_field**: Name used to specify which information to retrieve by the APIs. For examples, check below.
- **field_type**: The category of the field, as listed above under `Field Categories`. Note that for `type=='cross_reference'`, the field_type is the category of the cross-referenced database.
- **has_full_version**: Always `yes` for `type=='uniprot_field'`. Is used by UniProt to indicate whether a cross-referenced database is fully integrated.
- **has_full_version**: Not available for `type=='uniprot_field'`. If `yes`, a "full" version of the return field is accessible by using ``<field_name>_full``.
- **type**: Either "uniprot_field" or "cross_reference". The former indicates a field that is directly related to the protein, while the latter indicates a field that is a cross-reference to another database and not native to UniProt.

For more up-to-date information on `has_full_version` of cross-referenced fields, check the official UniProt documentation: `Return Fields <https://www.uniprot.org/help/return_fields_databases>`_
For more up-to-date information on `has_full_version` of cross-referenced fields, check the official UniProt documentation: `Return Fields <https://www.uniprot.org/help/return_fields_databases>`_. In case of discrepancies, issues or pull requests are welcome!

.. csv-table:: Supported Return Fields
:header-rows: 1
Expand Down
27 changes: 7 additions & 20 deletions src/UniProtMapper/idmapping_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,20 +60,6 @@ def __init__(
backoff_factor,
api_url,
)
self.default_fields = (
"accession",
"id",
"gene_names",
"protein_name",
"organism_name",
"organism_id",
"go_id",
"go_p",
"go_c",
"go_f",
"cc_subcellular_location",
"sequence",
)

@property
def _supported_dbs(self) -> list:
Expand Down Expand Up @@ -170,7 +156,7 @@ def get_id_mapping_results_search(self, fields: str, url: str, compressed: bool)
def get(
self,
ids: Union[List[str], str],
fields: Optional[Union[str, List]] = "default",
fields: Optional[Union[str, List]] = None,
from_db: str = "UniProtKB_AC-ID",
to_db: str = "UniProtKB-Swiss-Prot",
compressed: bool = True,
Expand All @@ -181,9 +167,10 @@ def get(

Args:
ids: list of IDs to be mapped or single string.
fields: list of UniProt fields to be retrieved. If None, will return the API's
default fields. `Note:` parameter not supported for datasets that aren't
strictly UniProtKB, e.g.: UniParc, UniRef... Defaults to None.
fields: list of UniProt return fields to be retrieved. If None, will return the
API's default fields. `default` can also be passsed to access `self.default_fields`.
**Note** parameter not supported for datasets that aren't strictly UniProtKB,
e.g.: UniParc, UniRef... Defaults to None.
from_db: database for the ids. Defaults to "UniProtKB_AC-ID".
to_db: UniProtDB to query to. For reviewed-only accessions, use default. If
you want to include unreviewed accessions, use "UniProtKB". Defaults to
Expand All @@ -207,10 +194,10 @@ def get(
fields = self.default_fields
else:
fields = np.char.lower(np.array(fields))
if not np.isin(fields, self.fields_table["returned_field"]).all():
if not np.isin(fields, self.supported_return_fields).all():
raise ValueError(
"Invalid fields. Valid fields are: "
f"{self.fields_table['returned_field'].values}"
f"{self.supported_return_fields}"
)
if to_db not in ["UniProtKB-Swiss-Prot", "UniProtKB"]:
if fields is not None:
Expand Down
30 changes: 28 additions & 2 deletions src/UniProtMapper/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,22 @@ class BaseUniProt(ABC):
- BaseUniProt -> ProtKB (UniProtKB API)
"""

fields_table = read_fields_table()
default_fields = (
"accession",
"id",
"gene_names",
"protein_name",
"organism_name",
"organism_id",
"go_id",
"go_p",
"go_c",
"go_f",
"cc_subcellular_location",
"sequence",
)

def __init__(
self,
pooling_interval: int = 3,
Expand All @@ -44,10 +60,20 @@ def __init__(
self.session = requests.Session()
self._setup_session()
self._re_next_link = re.compile(r'<(.+)>; rel="next"')
self._cached_supported_return_fields = None

@property
def fields_table(self) -> None:
return read_fields_table()
def supported_return_fields(self) -> list:
"""Return a list of the supported fields in UniProtKB & ID mapping API."""
if self._cached_supported_return_fields is None:
full_version_fields = (
self.fields_table.query('has_full_version == "yes"')["returned_field"]
+ "_full"
).tolist()
self._cached_supported_return_fields = (
self.fields_table["returned_field"].tolist() + full_version_fields
)
return self._cached_supported_return_fields

def _setup_retries(self, total_retries, backoff_factor) -> None:
return Retry(
Expand Down
Loading