This repository implements the Time Series Embedding Applications from "Dual-Purpose ATM Embeddings: From Synthetic Data Generation to Operational Analytics" - demonstrating how embeddings extracted from Temporal Convolutional Variational Autoencoders (TCVAE) can provide operational insights for Air Traffic Management.
This project shows how generative models trained for trajectory synthesis can serve a dual purpose: their learned embeddings provide powerful analytical capabilities for operational pattern discovery, outlier detection, and representative trajectory extraction - all without manual feature engineering.
- Operational Pattern Identification: Discover distinct approach procedures and routing strategies through embedding-based clustering
- Trajectory Outlier Detection: Identify anomalous flight paths that deviate from standard operational patterns
- Representative Trajectory Extraction: Select core-set trajectories that preserve operational diversity while dramatically reducing dataset size
- Similarity Analysis: Quantify operational relationships between trajectories using embedding distances
The Temporal Convolutional VAE uses dilated causal convolutions to capture multi-scale temporal dependencies in flight trajectories:
- Encoder: Stacked TCN layers with increasing dilation factors extract hierarchical temporal patterns
- Latent Space: Fixed-dimensional embeddings preserve essential spatiotemporal characteristics
- Applications: Embeddings enable downstream analysis tasks without trajectory reconstruction
# Extract latent representations from trained TCVAE
latent_vectors = extract_latent_representations(model, trajectory_data)
# Apply analysis techniques
clusters = apply_hdbscan_clustering(latent_vectors)
outliers = detect_trajectory_outliers(latent_vectors, clusters)
representatives = select_representative_paths(latent_vectors, clusters)# Install dependencies with Poetry
poetry install && poetry shell
# Start MLflow server for experiment tracking
mlflow server --host 127.0.0.1 --port 5000Train embeddings on your trajectory data:
python scripts/train.py \
--config configs/tcvae_config.yaml \
--trajectories_dir data/ \
--model_save_dir saved_models/Apply embedding-based analysis to discover operational patterns:
python scripts/cluster.py \
--config configs/tcvae_config.yaml \
--trajectories_dir data/ \
--model_save_dir saved_models/ \
--save_path results/Our paper shows embedding applications on three datasets:
- 7 distinct approach patterns automatically discovered
- Outlier detection identifies weather diversions and emergency procedures
- Representative extraction reduces 1000+ trajectories to 7 core examples
- 10 approach clusters reflecting complex airspace structure
- Multiple arrival corridors captured in embedding space
- Operational diversity preserved in representative trajectories
- Route-level patterns showing different airspace utilization strategies
- Unusual routing patterns flagged as outliers
- Core-set extraction enables efficient simulation scenarios
Embedding extraction is configured via YAML files:
model:
type: TCVAE
encoding_dim: 64 # Embedding dimensionality
h_dims: [64, 64, 64] # Encoder hidden layers
kernel_size: 16 # Temporal receptive field
dilation_base: 2 # Multi-scale pattern capture
data:
features: ['latitude', 'longitude', 'altitude', 'speed', 'track']
data_shape: 'image' # Trajectory representation format
train:
epochs: 500
accelerator: 'gpu'The framework provides multiple analytical techniques:
- PCA: Reduces embedding dimensionality for clustering
- UMAP/t-SNE: 2D visualization of embedding space structure
- HDBSCAN: Density-based clustering with automatic outlier detection
- Gaussian Mixture Models: Probabilistic clustering for operational regimes
- Embedding distances: Quantify trajectory operational similarity
- Medoid selection: Find most representative trajectory per cluster
- Density-based: HDBSCAN identifies trajectories outside cluster boundaries
- Operational significance: Outliers indicate unusual procedures or conditions
No manual feature engineering required - embeddings capture operational patterns directly from raw trajectory data.
Same embedding approach works across different trajectory types (approaches, routes, complete flights).
Once trained, embedding extraction is lightweight - enables real-time operational analysis.
Clusters correspond to meaningful operational patterns; outliers indicate investigation-worthy anomalies.
Trajectory data should be provided as pickle files containing Traffic objects with:
# Required trajectory features
features = ['latitude', 'longitude', 'altitude', 'speed', 'track']
# Traffic object structure
trajectories = Traffic.from_file("trajectory_data.pkl")
# Each trajectory: sequence of spatiotemporal measurementsThis implementation enables:
- Operational benchmarking: Compare airport approach procedures
- Safety analysis: Identify unusual trajectory patterns for investigation
- Efficiency optimization: Extract representative trajectories for scenario planning
- Simulation: Generate realistic trajectory sets from learned patterns
- Monitoring: Real-time detection of operational anomalies
@article{murad2024dual,
title={Dual-Purpose ATM Embeddings: From Synthetic Data Generation to Operational Analytics},
author={Murad, Abdulmajid and Ruocco, Massimiliano},
journal={[arXiv]},
year={2024}
}βββ src/trajcluster/
β βββ models/tcvae.py # TCVAE implementation
β βββ networks/ # TCN and neural network components
β βββ vae/ # VAE base classes and latent regularization
β βββ utils/ # Embedding extraction and analysis utilities
βββ scripts/
β βββ train.py # Train TCVAE embedding models
β βββ cluster.py # Extract embeddings and perform analysis
βββ configs/ # Model and training configurations
This research was conducted within the SynthAIr project, funded by the SESAR Joint Undertaking under the European Union's Horizon Europe research and innovation program (grant agreement No. 101114847).