With
|
index = query_index(splits=[split], filters=filters) |
one cannot filter a given split's indices by the availability of linked (apo or pred) structures for each system index. Instead, one must currently load the split parquet file manually like:
df = pd.read_parquet(
glob.glob(os.path.join(data_dir, "*", "*", "*", "splits", "split.parquet"))[0]
)
split_df = df[
(df["split"] == split) & (df["system_has_apo_or_pred"] == True) # noqa: E712
]
self._system_ids = list(set(split_df["system_id"]))
self._num_examples = len(self._system_ids)
Further, this only lets one filter by the presence of either an apo or pred structure, by not solely one or the other. This behavior is inconvenient and could be improved by adding the system_has_{apo,pred,apo_or_pred} columns to the annotations parquet file.
With
plinder/src/plinder/core/loader/dataset.py
Line 50 in 6ce4572
Further, this only lets one filter by the presence of either an
apoorpredstructure, by not solely one or the other. This behavior is inconvenient and could be improved by adding thesystem_has_{apo,pred,apo_or_pred}columns to the annotations parquet file.