Synthetic landing trajectories at Zurich Airport (LSZH) using TimeGAN.
Author of files: Sebastiaan Wijnands (except for TimeGAN implementation, taken directly from original authors). https://github.com/jsyoon0823/TimeGAN (timegan.py and utils.py)
The Python files containing all functionality are bundled into three modules, reflecting the key steps in any ML project workflow:
- Module I: Data Loading, Preprocessing, Labeling & Clustering
- Module II: TimeGAN Model Training, Post-Processing & Plotting
- Module III: Evaluation Framework - Diversity, Fidelity & Usefulness
cartesgeopandasitertoolsmatplotlibnumpyospandasrandomscipyshapelyscikit-learntensorflow 1.xtqdmtraffictslearnwarnings
For the user guide, the same module structuring is followed (outlined above). For the most granular information on individual functions and variables, please inspect the files. All functions have detailed doc-strings, and the code contains detailed comments.
These scripts are designed for preprocessing and labeling Zurich (LSZH) aircraft landing trajectories data from October 1, 2019, to November 30, 2019. It provides several functions to clean, process, visualize, and analyze aircraft landing trajectories. It supports identifying different landing scenarios, such as go-arounds, holding patterns, and detecting arrival runways. Users can control the sampling rate, complexity threshold, and the type of data files to generate (and save) in .npy/.npz file formats.
- Preprocess Landing Data: Load and preprocess Zurich (LSZH) landing trajectories from October 1, 2019, to November 30, 2019.
- Convert Timestamps: Convert date-time columns to Unix timestamps and calculate elapsed time (since the start of a trajectory) for analysis.
- Detect and Filter Events: Identify go-arounds, holding patterns, and runway-specific landings (and assign labels accordingly).
- Visualize Trajectories: Plot aircraft trajectories and landing patterns on a map for visual analysis.
- Data Formatting: Prepare trajectory data for machine learning by converting to NumPy arrays and splitting into train/test sets.
- Perform K-Means Clustering using DTW: Perform K-Means clustering on landing trajectories using Dynamic Time Warping (DTW) as the dedicated distance metric.
- Elbow Method Heuristic: Use the elbow method to determine the optimal number of clusters.
- Visualize Cluster Centers: Visualize the cluster centers or representative trajectories for each cluster.
These scripts pertain to the training of the TimeGAN model, and the post-processing and visualization of the synthetic samples. The TimeGAN model implementation is taken from the main author's official repository (timegan.py and utils.py). The remaining files work with this model and the resulting output data.
- TimeGAN Architecture: Model architecture as created by the original authors. Key command inputs requiring specification are sequence length, module, hidden dimensions, number of layers, iterations, and batch size.
- Training TimeGAN: Specify TimeGAN hyperparameter configurations and train the model to create synthetic aircraft landing trajectories from real data, then save the generated data and plot training losses.
- Data Post-Processing: Apply smoothing filters to the first three features (longitude, latitude, altitude) of trajectory data and save the processed data to separate files.
- Synthetic Data Visualization: Plot 2D (altitude profiles) and 3D (geographic coordinate profiles) visualizations of aircraft trajectories for comparison between different datasets, including real and synthetic data.
These scripts collectively evaluate the quality and usefulness of synthetic data compared to real data. They assess the similarity and diversity between datasets, train models to differentiate between real and synthetic data, and measure the effectiveness of synthetic data in predictive tasks. The evaluations are performed using various methods, including dimensionality reduction, model training, and statistical metrics. Required inputs are real and synthetic data sets in the format (number of samples, sequence length, feature dimension).
- Data Diversity: Assess the diversity of synthetic data by comparing its distribution to real data using dimensionality reduction techniques like PCA and t-SNE. Visualize how well the synthetic data represents the original data distribution (optionally flattening the data arrays along the temporal or feature dimensions).
- Data Fidelity: Train a discriminator model to differentiate between original and synthetic data. Evaluate the model's performance using metrics such as accuracy, confusion matrix, and true positive/negative rates to determine how convincingly the synthetic data mimics the real data.
- Data Usefulness: Evaluate the usefulness of synthetic data by comparing the predictive performance of models trained on synthetic versus real data. Relies on an LSTM-based model for time series regression to measure and compare mean absolute errors on test data.
- Energy Distance: Calculate the energy distance metric to assess the similarity and diversity between true and generated data. Measure how close or different the distributions of the real and synthetic datasets are by computing energy distances across multiple samples.

