Skip to content

Exception thrown when dataframes are passed as input to ParallelRunStep class #148

@manojkumar-github

Description

@manojkumar-github

It will be super helpful to let ParallelRunStep class to allow dataframes as inputs.

I understand that ParallelRunStep class only allows the input types - [DatasetConsumptionConfig, PipelineOutputTabularDataset,PipelineOutputTabularDataset, OutputFileDatasetConfig, OutputTabularDatasetConfig, LinkFileOutputDatasetConfig, LinkTabularOutputDatasetConfig]

Is it possible to let dataframes as inputs in ParallelRunStep. Could this be a usecase that Azure ML dev team would consider?

Exception                                 Traceback (most recent call last)
<ipython-input-27-215e373515cb> in <module>
      7     output=output_dir,
      8     allow_reuse=False,
----> 9     arguments=None
     10 )

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/parallel_run_step.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    155             side_inputs=side_inputs,
    156             arguments=arguments,
--> 157             allow_reuse=allow_reuse,
    158         )
    159 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in __init__(self, name, parallel_run_config, inputs, output, side_inputs, arguments, allow_reuse)
    259 
    260         self._process_inputs_output_dataset_configs()
--> 261         self._validate()
    262         self._get_pystep_inputs()
    263 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate(self)
    329         """Validate input params to init parallel run step class."""
    330         self._validate_arguments()
--> 331         self._validate_inputs()
    332         self._validate_output()
    333         self._validate_parallel_run_config()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _validate_inputs(self)
    410 
    411         if self._inputs:
--> 412             self._input_ds_type = self._get_input_type(self._inputs[0])
    413             for input_ds in self._inputs:
    414                 if self._input_ds_type != self._get_input_type(input_ds):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/_parallel_run_step_base.py in _get_input_type(self, in_ds)
    399             ds_mapping_type = INPUT_TYPE_DICT[input_type]
    400         else:
--> 401             raise Exception("Step input must be of any type: {}, found {}".format(ALLOWED_INPUT_TYPES, input_type))
    402         return ds_mapping_type
    403 

Exception: Step input must be of any type: (<class 'azureml.data.dataset_consumption_config.DatasetConsumptionConfig'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputFileDataset'>, <class 'azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset'>, <class 'azureml.data.output_dataset_config.OutputFileDatasetConfig'>, <class 'azureml.data.output_dataset_config.OutputTabularDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkFileOutputDatasetConfig'>, <class 'azureml.data.output_dataset_config.LinkTabularOutputDatasetConfig'>), found <class 'pandas.core.frame.DataFrame'>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions