Skip to content

Conversation

@drake-nominal
Copy link
Contributor

No description provided.

Args:
extractor: ContainerizedExtractor instance (or rid of one) to use for extracting and ingesting data.
sources: Mapping of environment variables to source files to use with the extractor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we clarify this description a bit? Maybe an example of env var -> source file? I'm not sure what this would be.

sources: Mapping of environment variables to source files to use with the extractor.
NOTE: these must match the registered inputs of the containerized extractor exactly
tag: Docker image tag to use the extractor of.
NOTE: if not provided, the default registered docker tag will be used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom extractor can contain multiple Docker containers?

if extractor_input.required and extractor_input.environment_variable not in sources:
raise ValueError(f"Required input '{extractor_input.environment_variable} not present in sources!")

# Ensure all provided inputs are permitted by the etractor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Ensure all provided inputs are permitted by the etractor
# Ensure all provided inputs are permitted by the extractor

self._clients.upload,
)
logger.info("Uploaded %s -> %s", source_path, s3_path)
s3_inputs[source] = source_path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the operations fails beyond this point, is it the client's responsibility to delete uploaded files?

for extractor_input in extractor.inputs:
registered_inputs.add(extractor_input.environment_variable)
if extractor_input.required and extractor_input.environment_variable not in sources:
raise ValueError(f"Required input '{extractor_input.environment_variable} not present in sources!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Required input '{extractor_input.environment_variable} not present in sources!")
raise ValueError(f"Required input '{extractor_input.environment_variable}' not present in sources!")

if isinstance(extractor, str):
extractor = ContainerizedExtractor._from_conjure(
self._clients,
self._clients.containerized_extractors.get_containerized_extractor(self.rid, extractor),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just create a method to get an extractor instead of inlining this?

def add_custom(
self,
extractor: str | ContainerizedExtractor,
sources: Mapping[str, Path],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we generally do Path | str everywhere

Base automatically changed from deidukas/feat-register-extractors to main July 24, 2025 16:41
varun-nominal and others added 3 commits July 29, 2025 11:16
Co-authored-by: Alexander Reynolds <alex.reynolds@nominal.io>
Co-authored-by: Alexander Reynolds <alex.reynolds@nominal.io>
Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants