There is a difference between our pipeline workflow and the pipeline workflow of scikitlearn.
E.g.,
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
pipe.fit(X_train, y_train)
Note, they have X_train as a simple np.ndarray with shape (samples, features). This is something we do not have. Our X is generally a number of resampled xr.Datasets.
For us, a realistic pipeline would look like:
RF = sklearn.models.RF(...)
Pipeline([RGDR(y).fit(sst_precursor), RGDR(y).fit(z200_precursor), EOF.fit(OLR_precursor), 'merger_of_features', 'feature_selection', RF])
Originally posted by @semvijverberg in #71 (comment)
There is a difference between our
pipelineworkflow and thepipelineworkflow of scikitlearn.E.g.,
Note, they have X_train as a simple np.ndarray with shape (samples, features). This is something we do not have. Our X is generally a number of resampled xr.Datasets.
For us, a realistic pipeline would look like:
RF = sklearn.models.RF(...)
Pipeline([RGDR(y).fit(sst_precursor), RGDR(y).fit(z200_precursor), EOF.fit(OLR_precursor), 'merger_of_features', 'feature_selection', RF])
Originally posted by @semvijverberg in #71 (comment)