feat: add possibility to get dataset by urn without loading all datasets #142#141
feat: add possibility to get dataset by urn without loading all datasets #142#141atomao wants to merge 8 commits intodevelopmentfrom
Conversation
|
could you pls create issue and link it here? this allows to include this PR/issue in release notes on the next release. do not forget to add issue number in the end of PR name. thanks |
| async def get_dataset_by_source_id( | ||
| self, auth_context: AuthContext, dataset_id: str | ||
| ) -> DataSet | None: | ||
| datasets = await self._load_datasets(auth_context) | ||
| for ds in datasets: | ||
| if ds.data.source_id == dataset_id: | ||
| return ds.data | ||
| return None |
There was a problem hiding this comment.
I guess we now can optimize this function:
- map source_id to dataset urn (filter dataset models from db)
- instantiate dataset instance only once, using new
_get_dataset_by_urnfunction
also, probably can optimize other functions in this module. @Fedir-Yatsenko , what do you think?
statgpt/app/services/chat_facade.py
Outdated
| ] | ||
|
|
||
| if len(dataset_models) > 1: | ||
| raise ValueError(f"Multiple datasets found for the same URN: {dataset_urn.short_urn()}") |
There was a problem hiding this comment.
@Fedir-Yatsenko , do you want to raise or simply pick the first model here?
There was a problem hiding this comment.
I'm not sure that extracting a list of all datasets is the best option. Let's discuss.
statgpt/app/services/chat_facade.py
Outdated
| raise ValueError( | ||
| f"Multiple data sources found for the same dataset: {dataset_model.id}" | ||
| ) |
There was a problem hiding this comment.
@Fedir-Yatsenko same here: raise or pick first?
There was a problem hiding this comment.
Honestly, I don't understand why we need to query for a list of items if we can retrieve an item by ID? (get_by_id or get_schema_by_id)
There was a problem hiding this comment.
done for data sources, but for dataset models we first need to fetch all versions and then fetch ds models to find model with correct urn
…taset_version model by uuid (added in dataset service)
Applicable issues
fixes Add functionality to retrieve one specific dataset #142 (same info as here)Current implementations likeget_dataset_by_source_idorlist_available_datasetloads all channel datasets which can be slow when we want to retrieve only one specific dataset.Description of changes
Addedget_dataset_by_urnmethod toChannelServiceFacadeto retrieve only one dataset without loading all.get_dataset_by_uuidmethod toDataSetServiceto get specific dataset model by uuid.get_last_completed_dataset_version_by_uuidmethod toDataSetServiceto get specific dataset last completed version model by uuid.By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.