-
Notifications
You must be signed in to change notification settings - Fork 1.1k
traffic forecasting using time-moe #2339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:23Z The dataset comprises vehicular traffic volume counts recorded at six distinct urban intersections at 15-minute intervals, with a total of 96 observations per day. The complete dataset consists of a total of 5,376 time-stamped data points, representing continuous traffic flow monitoring over several days. The following cell downloads the data: |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:23Z While deep learning models can learn complex patterns automatically, ACF plots remain a valuable exploratory tool for understanding the data, identifying how strongly past time steps influence future values, guiding the selection of input sequence length for model design, detecting seasonality (periodic occurrences of peaks and troughs for example) and ag patterns, and validating results. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:24Z In this method, we are using univariates of vehicle counts to forecast the 96 timesteps of future vehicle counts for every 15 minutes based on historical data. We will be using the one step univariate method to train the model and will prepare the data. In this method, we perform univariate time series forecasting to forecast the next 96 time steps, accounting for 24 hours of future vehicle counts at 15-minute intervals using only the historical values of vehicle counts. This setup follows a one-step univariate forecasting approach where the model is trained to predict one time step ahead, then iteratively forecast multiple steps. The dataset is preprocessed using the The key input parameters required for the tool are:
Here, preprocessors are used to apply data normalization or standardization (e.g., MinMaxScaler or StandardScaler). Scaling is a common practice in deep learning to improve model convergence and performance, especially when using models that are sensitive to the scale of input features. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:25Z From the plot above, we can see that in these urban traffic datasets collected at high temporal resolution, every 15 minutes in this case, there are frequent zero or near-zero vehicle counts, particularly during off-peak hours like late at night or early in the morning. These low-activity periods introduce sparsity and non-uniform temporal patterns, which can pose challenges for time series forecasting models, especially deep learning models that rely on learning temporal dependencies. Such sparsity also introduces challenges for time series modeling, as it can lead to non-stationarity, class imbalance, and difficulty in capturing temporal dynamics. These characteristics must be carefully addressed during model training, especially when using deep learning architectures like LSTM Transformer-based models that are sensitive to the distribution and continuity of input sequences. This is where fine-tuning a pretrained backbone allows the model to adapt to the specific characteristics of traffic datasets, like sparsity during off-peak hours, while retaining robust general features. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:26Z For model initialization, the preprocessed data, the backbone for training the model, and the sequence length are passed as parameters. The sequence length is a critical parameter and should be selected carefully. The sequence length is usually the periodicity of the data. You can experiment with different sequence lengths, given sufficient computing resources are available. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:26Z Since the model is pretrained, we use 25 epochs for fine-tuning. This number was determined after running a few iterations and found to be sufficient for convergence without overfitting. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:27Z Now, to ensure the model's effectiveness, the trained model is used to forecast traffic. First, we forecast for 96 future timestamps, at 15 minute interval (15 minutes × 96 = 24 hours), thus covering one whole day, starting on 00:00:00, 00:15:00, 00:30:00 etc., until 23:45:00 of the same day. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:27Z Next, the predicted values are then validated against the actual observed vehicle counts to assess the model’s accuracy and generalization capability. This step is crucial for verifying the model's effectiveness in capturing temporal patterns and producing reliable forecasts in real-world traffic scenarios. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:28Z Training and evaluation using Time-MoEWe will use the following script to train and evaluate all crossings simultaneously. This is a resource-intensive pretrained model, and it is recommended that it be run in a system with sufficient memory and GPU compute to avoid memory errors. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:28Z Training and evaluation using Bidirectional LSTM
|
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:29Z Now let's compare how the model would have performed against a leading non-pretrained time series backbone, specifically a bidirectional LSTM architecture using the same train-test split. This comparison will assess the value added by the Time-MoE pretrained model over traditional deep learning backbones with the same data. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:30Z To evaluate and compare model performance across all six intersections, the resulting performance metrics, such as R-Squared, MAE, and RMSE, were compiled and compared against those obtained from a Bidirectional LSTM baseline. This comparative analysis helps assess the effectiveness of each model in capturing temporal traffic patterns and supporting downstream applications like signal timing optimization and adaptive traffic control. For example, lets compare the R-Squared metric. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:31Z The pretrained Time-MoE model consistently outperforms the bi-directional LSTM across the crossing forecasts by leveraging learned temporal representations from large, diverse datasets, enabling it to capture complex seasonality, trends, and noise patterns that are difficult to learn from limited data. In deep learning, especially when training deep learning models due to inherent stochasticity caused by factors like weight initialization and batch sampling, you can train the model multiple times and evaluate the validation performance across runs. Once the optimal model instance is selected based on validation metrics, it can be used for forecasting. |
|
View / edit / reply to this conversation on ReviewNB BP-Ent commented on 2025-10-24T22:34:31Z Fine-tuning enabled better adaptation to traffic-specific patterns, like sparse or zero counts, while also accelerating convergence and enhancing robustness to data sparsity and noise. The forecasted traffic volumes were then applied to optimize adaptive signal control using Webster's formula as a delay estimation model, enabling dynamic traffic signal timing to reduce vehicle delays. However, limitations include potential domain mismatch reducing the transfer of learning benefits, risks of overfitting on very small datasets, and greater computational demands compared to simpler models. Future work to further improve traffic forecasting should include spatio-temproal modeling, domain adaptation techniques, and integration of larger and spatially diverse real-time data to further enhance forecasting accuracy and adaptive traffic management in complex urban environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes on reviewnb
|
@BP-Ent , all changes added, pls check, thanks |
<insert pull request description here>
Checklist
Please go through each entry in the below checklist and mark an 'X' if that condition has been met. Every entry should be marked with an 'X' to be get the Pull Request approved.
imports are in the first cell?arcgisimports? Note that in some cases, for samples, it is a good idea to keep the imports next to where they are used, particularly for uncommonly used features that we want to highlight.GISobject instantiations are one of the following?gis = GIS()gis = GIS('home')orgis = GIS('pro')gis = GIS(profile="your_online_portal")gis = GIS(profile="your_enterprise_portal")./misc/setup.pyand/or./misc/teardown.py?api_data_owneruser?api_data_owneraccount and change the notebook to first download and unpack the files.<img src="base64str_here">instead of<img src="https://some.url">? All map widgets contain a static image preview? (Callmapview_inst.take_screenshot()to do so)os.path.join()? (Instead ofr"\foo\bar",os.path.join(os.path.sep, "foo", "bar"), etc.)Export Training Data Using Deep Learningtool published on geosaurus org (api data owner account) and added in the notebook usinggis.content.getfunction?gis.content.getfunction? Note: This includes providing test raster and trained model.