Skip to content

feat: add possibility to get dataset by urn without loading all datasets #142#141

Open
atomao wants to merge 8 commits intodevelopmentfrom
add_fast_get_dataset_by_urn
Open

feat: add possibility to get dataset by urn without loading all datasets #142#141
atomao wants to merge 8 commits intodevelopmentfrom
add_fast_get_dataset_by_urn

Conversation

@atomao
Copy link
Contributor

@atomao atomao commented Feb 13, 2026

Applicable issues

  • fixes Add functionality to retrieve one specific dataset #142 (same info as here)
  • Current implementations like get_dataset_by_source_id or list_available_dataset loads all channel datasets which can be slow when we want to retrieve only one specific dataset.
  • Above changes reverted and currently we added a possibiliy to get dataset/dataset versions models using dataset service (and then download dataset using handler)

Description of changes

  • Added get_dataset_by_urn method to ChannelServiceFacade to retrieve only one dataset without loading all.
  • Added get_dataset_by_uuid method to DataSetService to get specific dataset model by uuid.
  • Added get_last_completed_dataset_version_by_uuid method to DataSetService to get specific dataset last completed version model by uuid.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@atomao atomao self-assigned this Feb 13, 2026
@atomao atomao requested a review from ypldan as a code owner February 13, 2026 13:23
@navalnica
Copy link
Contributor

could you pls create issue and link it here? this allows to include this PR/issue in release notes on the next release. do not forget to add issue number in the end of PR name. thanks

Comment on lines 479 to 486
async def get_dataset_by_source_id(
self, auth_context: AuthContext, dataset_id: str
) -> DataSet | None:
datasets = await self._load_datasets(auth_context)
for ds in datasets:
if ds.data.source_id == dataset_id:
return ds.data
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we now can optimize this function:

  1. map source_id to dataset urn (filter dataset models from db)
  2. instantiate dataset instance only once, using new _get_dataset_by_urn function

also, probably can optimize other functions in this module. @Fedir-Yatsenko , what do you think?

]

if len(dataset_models) > 1:
raise ValueError(f"Multiple datasets found for the same URN: {dataset_urn.short_urn()}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fedir-Yatsenko , do you want to raise or simply pick the first model here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that extracting a list of all datasets is the best option. Let's discuss.

@atomao atomao changed the title feat: add possibility to get dataset by urn without loading all datasets feat: add possibility to get dataset by urn without loading all datasets #142 Feb 13, 2026
Comment on lines +451 to +453
raise ValueError(
f"Multiple data sources found for the same dataset: {dataset_model.id}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fedir-Yatsenko same here: raise or pick first?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't understand why we need to query for a list of items if we can retrieve an item by ID? (get_by_id or get_schema_by_id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done for data sources, but for dataset models we first need to fetch all versions and then fetch ds models to find model with correct urn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add functionality to retrieve one specific dataset

3 participants