Load only a subset of columns

## 🚀 Feature Request

I would like the ability to only load certain key/value pairs of the dict of a `StreamingDataset`. These would be specified on initialization of the object.

## Motivation

I have a dataset with many fields which supports many of our users' different training and testing needs. However, each user typically only uses a smallish subset of the fields in the dataset, the remainder are loaded and not used. To streamline data load it seems I would have to subselect data and save to separate datasets appropriate for each use, which is difficult to manage, prone to errors, and likely to have some storage redundancy. Better would be to only load what is needed from each shard.

## [Optional] Implementation

I can think of three ways to do this:
1. (this does not save any time, and is ridiculous, but provides the requested functionality): load the datapoint as is done currently and discard fields not requested. 
2. Save each field to a separate dataset, then perform a bunch of _coordinated_ reads and stitch them together. I suppose I could do this myself if it were easy to coordinate the random reads of each sub object.
3. Ask here for this functionality to be added to the package. If the design of shard files does not easily support this modification, this is a dead end.

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load only a subset of columns #926

🚀 Feature Request

Motivation

[Optional] Implementation

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Load only a subset of columns #926

Description

🚀 Feature Request

Motivation

[Optional] Implementation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions