-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Cross-validator should be date-time aware to avoid train/test information leakage. Currently the cross-validator is separated from date-time target information and lags.
Currently, date-time target information is decided here:
Lines 58 to 61 in 0dd397f
| start_end_TVdate=None, | |
| tfreq: int=10, | |
| start_end_date: Tuple[str, str]=None, | |
| start_end_year: Tuple[int, int]=None, |
and train-test information is decided here:
Line 314 in 0dd397f
| def traintest(self, method: Union[str, bool]=None, seed=1, |
When all this information is both known, this function is called:
Lines 1382 to 1384 in 0dd397f
| def cross_validation(RV_ts, traintestgroups=None, test_yrs=None, method=str, | |
| seed=None, gap_prior: int=None, gap_after: int=None): | |
| #%% |
These functions are used for lag shifting:
proto/forecasting/func_models.py
Line 153 in 0dd397f
| def apply_shift_lag(fit_masks, lag_i): |
proto/forecasting/func_models.py
Line 122 in 0dd397f
| def _check_y_fitmask(fit_masks, lag_i, base_lag): |
I'm not a 100% sure if the following is still used in the code:
Line 1730 in 0dd397f
| def func_dates_min_lag(dates, lag): |
Alternatively, we could create a new package/module that would be able to combine all information on datetime target information, lags and train/test information.
I propose to call it: S2SCV