Normalisation fix by himanshu-nimonkar · Pull Request #2 · via-cs/cap

himanshu-nimonkar · 2025-06-29T12:03:35Z

No description provided.

…anch

Copilot

Pull Request Overview

This PR, titled "Normalisation fix", addresses several issues related to data normalization and batch preparation across the project’s data loading, model training, evaluation, and configuration. Key changes include:

Updates to the CSV-based dataset classes to enable consistent per-sample and global normalization.
Modifications to the collate functions and DataLoader construction to ensure that batches are prepared correctly for multiple model types.
Updates to model implementations (e.g. Autoformer, Transformer, TimesNet) and configuration files to include explicit sequence and prediction lengths.

Reviewed Changes

Copilot reviewed 27 out of 45 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
cap/data/data.py	Updated CSVSequenceDataset and FedformerSequenceDataset normalization and batching
cap/training/trainer.py	Unified batch preparation across models using prepare_batch; added slicing for fedformer/timesnet outputs
cap/training/evaluator.py	Revised evaluation function to accumulate losses based on sample count rather than batch count
cap/models/Autoformer.py	Adjusted the shape of the zeros tensor for trend decomposition and output slicing
Various config files	Updated dataset paths, normalization flags, and added seq_len/pred_len parameters

Comments suppressed due to low confidence (2)

cap/models/Autoformer.py:115

Verify that changing the zeros tensor’s shape to use a fixed dimension of 1 (instead of x_dec.shape[2]) correctly aligns with the expected output dimensions; a mismatch here could lead to subtle errors in the trend and seasonal component recombination.

        zeros = torch.zeros([x_dec.shape[0], self.pred_len, 1]).to(x_dec.device)

cap/data/data.py:175

Consider adding a comment to explain why the normalized output is converted to a list (using .tolist()) instead of remaining as a NumPy array or tensor, to clarify the design decision for downstream processing.

            out = ((out_arr - self.y_mean) / self.y_std).tolist()

cap/training/trainer.py

Updated hyperparameters

Copilot

Pull Request Overview

This PR refactors data loading to support CSV-based datasets with optional normalization, standardizes batch preparation across models, and introduces device fallback and sequence-length overrides for certain models.

Add CSVSequenceDataset and updated get_dataloaders to handle CSV inputs and normalization flags
Introduce BaseTimeSeriesModel and implement prepare_batch in all time-series models
Unified training/evaluation loops with model-agnostic batch handling; add device fallback in train_et_model.py

Reviewed Changes

Copilot reviewed 27 out of 45 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
train_et_model.py	Fallback to CPU on missing CUDA and override seq/pred lengths for Fedformer/TimesNet
requirement.txt	Loosened numpy/scipy version constraints
experiment/main.py	Reformatted arg parsing and unified dataloader invocation
cap/utils/base.py	Added `BaseTimeSeriesModel` for shared batch logic
cap/training/trainer.py	Extended `train_model` to support all model types and unified batch processing
cap/training/evaluator.py	Unified `evaluate_model` with model-agnostic batch handling
cap/models/transformer.py	Added `prepare_batch` method
cap/models/lstm.py	Subclassed `BaseTimeSeriesModel` and added `prepare_batch`
cap/models/TimesNet.py	Implemented `prepare_batch`
cap/models/Informer.py	Subclassed and added `prepare_batch`
cap/models/FEDFormer.py	Added `prepare_batch`
cap/models/Autoformer.py	Subclassed and added `prepare_batch`, fixed output slicing
cap/data/data.py	Introduced `CSVSequenceDataset`, `FedformerSequenceDataset`, and revamped `get_dataloaders`
cap/configs/*.yaml	Updated sample configs for normalization and seq/pred lengths
cap/main.py	Added TimesNet support in CLI

Comments suppressed due to low confidence (1)

requirement.txt:1

[nitpick] Filename requirement.txt is unconventional; consider renaming to requirements.txt to follow Python packaging conventions.

# Core dependencies

Copilot · 2025-07-07T01:09:20Z

cap/training/trainer.py

-def train_model(train_loader, valid_loader, input_dim, output_dim, seq_len, pred_len, 
-                hidden_dim=128, num_layers=2, epochs=1, lr=0.01, patience=5, 
-                device="cuda" if torch.cuda.is_available() else "cpu", model_type='lstm'):
+def train_model(


The num_layers parameter is required but has no default, yet calls to train_model only pass hidden_dim by keyword. Consider providing a default for num_layers or making it a keyword-only argument to avoid missing-argument errors.

Copilot · 2025-07-07T01:09:20Z

cap/data/data.py

+
+        ds = CSVSequenceDataset(path,
+                                seq_len=seq_len,
+                                pred_len=pred_len,
+                                normalization=normalization)


The FedformerSequenceDataset instance is immediately overwritten by CSVSequenceDataset on the next line, so the specialized Fedformer loader never gets used. You can fix this by using elif or scoping the CSV loader under an else branch.

Suggested change

ds = CSVSequenceDataset(path,

seq_len=seq_len,

pred_len=pred_len,

normalization=normalization)

elif model_type.lower() == 'csv':

ds = CSVSequenceDataset(path,

seq_len=seq_len,

pred_len=pred_len,

normalization=normalization)

Copilot · 2025-07-07T01:09:21Z

cap/training/trainer.py

+            'e_layers':  1,
+            'd_layers':  1,
+            'dropout':   0.05,
+            'dropout':   0.05,


[nitpick] Duplicate 'dropout' key in the kwargs dict; the second entry overrides the first. Removing one will reduce confusion.

Suggested change

'dropout': 0.05,

himanshu-nimonkar and others added 11 commits May 20, 2025 03:21

WIP: Local changes to training pipeline and model integrations

8448580

Resolved merge conflicts between local changes and Data_Processing br…

3f7b73a

…anch

Removed venv and added to .gitignore

91ababe

Remove venv from version control and add to .gitignore

3d8d950

Add .gitignore to exclude venv and system files

f100749

Sync local CAP → normalisation_fix

d0e4de6

Remove venv and large CSVs; add to .gitignore

c33aced

Remove venv and large datasets; add to .gitignore

80fac66

Remove venv and large data files; add to .gitignore

d35a157

Stop tracking venv/ and raw data CSVs

e90cf35

Ignore venv/ and raw data CSVs

784ee09

himanshu-nimonkar requested review from Christinacyq and Copilot June 29, 2025 12:04

Copilot AI reviewed Jun 29, 2025

View reviewed changes

cap/training/trainer.py Show resolved Hide resolved

himanshu-nimonkar self-assigned this Jun 29, 2025

Update trainer.py

6228789

Updated hyperparameters

himanshu-nimonkar requested a review from Copilot July 7, 2025 00:48

This comment was marked as outdated.

Sign in to view

Copilot AI reviewed Jul 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalisation fix#2

Normalisation fix#2
himanshu-nimonkar wants to merge 12 commits intomainfrom
normalisation_fix

himanshu-nimonkar commented Jun 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 7, 2025

Uh oh!

Copilot AI Jul 7, 2025

Uh oh!

Copilot AI Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

himanshu-nimonkar commented Jun 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants