-
Notifications
You must be signed in to change notification settings - Fork 2
feat: allow ingesting custom data with python client #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Args: | ||
| extractor: ContainerizedExtractor instance (or rid of one) to use for extracting and ingesting data. | ||
| sources: Mapping of environment variables to source files to use with the extractor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we clarify this description a bit? Maybe an example of env var -> source file? I'm not sure what this would be.
| sources: Mapping of environment variables to source files to use with the extractor. | ||
| NOTE: these must match the registered inputs of the containerized extractor exactly | ||
| tag: Docker image tag to use the extractor of. | ||
| NOTE: if not provided, the default registered docker tag will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom extractor can contain multiple Docker containers?
| if extractor_input.required and extractor_input.environment_variable not in sources: | ||
| raise ValueError(f"Required input '{extractor_input.environment_variable} not present in sources!") | ||
|
|
||
| # Ensure all provided inputs are permitted by the etractor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Ensure all provided inputs are permitted by the etractor | |
| # Ensure all provided inputs are permitted by the extractor |
| self._clients.upload, | ||
| ) | ||
| logger.info("Uploaded %s -> %s", source_path, s3_path) | ||
| s3_inputs[source] = source_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the operations fails beyond this point, is it the client's responsibility to delete uploaded files?
| for extractor_input in extractor.inputs: | ||
| registered_inputs.add(extractor_input.environment_variable) | ||
| if extractor_input.required and extractor_input.environment_variable not in sources: | ||
| raise ValueError(f"Required input '{extractor_input.environment_variable} not present in sources!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise ValueError(f"Required input '{extractor_input.environment_variable} not present in sources!") | |
| raise ValueError(f"Required input '{extractor_input.environment_variable}' not present in sources!") |
| if isinstance(extractor, str): | ||
| extractor = ContainerizedExtractor._from_conjure( | ||
| self._clients, | ||
| self._clients.containerized_extractors.get_containerized_extractor(self.rid, extractor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just create a method to get an extractor instead of inlining this?
| def add_custom( | ||
| self, | ||
| extractor: str | ContainerizedExtractor, | ||
| sources: Mapping[str, Path], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we generally do Path | str everywhere
Co-authored-by: Alexander Reynolds <alex.reynolds@nominal.io>
Co-authored-by: Alexander Reynolds <alex.reynolds@nominal.io>
Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
No description provided.