Dataset should support binary Buffer payload #3211
Replies: 2 comments
-
|
Hello, and thank you for your interest in this project! By definition, the Because of the dataset implementation, you cannot store binary data directly into it - the Dataset backend (both local and on Apify Platform) stores the individual Dataset items as serialized JSON objects. If you want to store binary data from the Crawlee crawlers, you can use the Key-Value Store class (docs). This is a Crawlee-native S3-like storage that allows you to store arbitrary (binary) data under string keys. Alternatively, if you have to store the data in the dataset, you could use I'll close this discussion as solved, but feel free to ask additional questions if you have any. Cheers! |
Beta Was this translation helpful? Give feedback.
-
|
@barjin, thanks for the reply!
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Which package is the feature request for? If unsure which one to select, leave blank
@crawlee/core
Feature
It would be great to have a plugin or API that allows accessing the binary data directly, or at least the option to store it in a Dataset without serialization.
I see that the current interface allows restoring data via
Buffer.from, but I’m not sure about the efficiency of this approach or the acceptable data size it can handle.Motivation
I’d like to share my use case.
I have the Crawlee logic extracted into a separate package, and the result of this package’s work is a screenshot.
Right now, I have to upload it to S3 directly inside the requestHandler, which feels like a terrible anti-pattern.
Ideal solution or implementation, and any additional constraints
The most straightforward solution would be to provide an interface for working with S3.
Alternative solutions or implementations
No response
Other context
No response
Beta Was this translation helpful? Give feedback.
All reactions