Skip to content

Refactor cross-validator #3

@geek-yang

Description

@geek-yang

Cross-validator should be date-time aware to avoid train/test information leakage. Currently the cross-validator is separated from date-time target information and lags.

Currently, date-time target information is decided here:

start_end_TVdate=None,
tfreq: int=10,
start_end_date: Tuple[str, str]=None,
start_end_year: Tuple[int, int]=None,

and train-test information is decided here:

def traintest(self, method: Union[str, bool]=None, seed=1,

When all this information is both known, this function is called:

proto/RGCPD/functions_pp.py

Lines 1382 to 1384 in 0dd397f

def cross_validation(RV_ts, traintestgroups=None, test_yrs=None, method=str,
seed=None, gap_prior: int=None, gap_after: int=None):
#%%

These functions are used for lag shifting:

def apply_shift_lag(fit_masks, lag_i):

def _check_y_fitmask(fit_masks, lag_i, base_lag):

I'm not a 100% sure if the following is still used in the code:

def func_dates_min_lag(dates, lag):

Alternatively, we could create a new package/module that would be able to combine all information on datetime target information, lags and train/test information.

I propose to call it: S2SCV

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions