Cannot filter index by availability of linked structures

With https://github.com/plinder-org/plinder/blob/6ce457240fbac12ca759be64988bd38aa0631e64/src/plinder/core/loader/dataset.py#L50 one cannot filter a given split's indices by the availability of linked (apo or pred) structures for each system index. Instead, one must currently load the split parquet file manually like:
```python
df = pd.read_parquet(
    glob.glob(os.path.join(data_dir, "*", "*", "*", "splits", "split.parquet"))[0]
)
split_df = df[
    (df["split"] == split) & (df["system_has_apo_or_pred"] == True)  # noqa: E712
]

self._system_ids = list(set(split_df["system_id"]))
self._num_examples = len(self._system_ids)
```
Further, this only lets one filter by the presence of either an `apo` or `pred` structure, by not solely one or the other. This behavior is inconvenient and could be improved by adding the `system_has_{apo,pred,apo_or_pred}` columns to the annotations parquet file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot filter index by availability of linked structures #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot filter index by availability of linked structures #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions