This repository implements a framework for benchmarking various network filtering techniques. It also provides real and simulated networks and it implements common network filtering techniques for benchmarking purposes. See original research proposal HERE.
- data/: Includes sample and real networks for benchmarking
- real_nets/: Networks extracted from Index of Complex Networks (ICON)
- simulated_nets/: Random networks
- src/: Contains the source code for the benchmarking framework
- benchmark/: Benchmarking utilities and core functionality
- net_filtering/: Implementation of various network filtering techniques
- results/: Directory to store benchmark results and performance metrics
-
Clone the repository:
git clone https://github.com/nathanruf/Benchmarking-Network-Filtering.git -
Navigate to the project directory:
cd Benchmarking-Network-Filtering -
Install the required dependencies:
pip install -r requirements.txt
To run a benchmark:
-
Ensure you have the necessary dataset in the
data/directory. -
Use the
src/benchmark/bench_noise_filtering.pyscript to benchmark a specific filter:from src.benchmark.bench_noise_filtering import bench_noise_filtering from src.net_filtering.filter import Filter import networkx as nx # Load your network G = pickle.load("/data/simulated_nets/random_graph.pickle") # Create a filter instance filter_instance = Filter() # Run the benchmark result = bench_noise_filtering(G, filter_instance.mst) print(f"Result: {result")
- The framework adds noise to the input network using the
add_noise_to_networkfunction. - It then applies the specified filtering technique to the noisy network.
- The performance is evaluated by comparing the filtered network to the original network.
The /data directory contains both real and simulated network datasets for benchmarking purposes.
The /data/real_nets/ directory contains networks extracted from ICON (Index of Complex Networks). These networks represent various real-world systems and phenomena across different domains. See networks metadata here.
- Full Data:
real_net.pickle
Corpus of 550 real-world networks drawn from the Index of Complex Networks (ICON) used in this PNAS paper. This corpus spans a variety of sizes and structures, with 23% social, 23% economic, 32% biological, 12% technological, 3% information, and 7% transportation graphs.
- Sample Data:
real_net_sample.pickle
A sample of 4 hand-picked networks to run quick benchmarking; network_index = [447, 133, 122, 80]. Refer to ICON network metadata to view details of those sample networks.
The /data/simulated_nets/ directory contains artificially generated network datasets. These include:
-
Random Graphs (Erdős-Rényi model)
- Unweighted:
random_graph.pickle - Weighted:
random_graph_weighted.pickle
- Unweighted:
-
Grid Graphs
- Unweighted:
grid_graph.pickle - Weighted:
grid_graph_weighted.pickle
- Unweighted:
-
Barabási-Albert Graphs (Scale-free networks)
- Unweighted:
barabasi_albert_graph.pickle - Weighted:
barabasi_albert_graph_weighted.pickle
- Unweighted:
-
LFR Benchmark Graphs (with community structure)
- Unweighted:
lfr_benchmark_graph.pickle - Weighted:
lfr_benchmark_graph_weighted.pickle
- Unweighted:
-
Watts-Strogatz Graphs (Small-world networks)
- Unweighted:
watts_strogatz_graph.pickle - Weighted:
watts_strogatz_graph_weighted.pickle
- Unweighted:
These simulated networks provide a diverse set of graph structures and properties, allowing for comprehensive benchmarking of filtering techniques across various network types.
To generate new simulated networks or modify existing ones, refer to the data/simulated_nets/generate_simulated_networks.py script.
The src/net_filtering/filter.py file contains various network filtering and graph sparsification techniques. Here's an overview of the available methods:
-
Minimum Spanning Tree (MST)
- Method:
mst(graph) - Computes the Minimum Spanning Tree of a graph, keeping the minimum set of edges that connect all nodes with the lowest total edge weight.
- Method:
-
Planar Maximally Filtered Graph (PMFG)
- Method:
pmfg(graph) - Constructs a planar graph that maximizes the sum of edge weights while maintaining planarity.
- Method:
-
Global Threshold Filter
- Method:
threshold(graph, threshold) - Applies a global threshold to edge weights, removing edges below the specified threshold.
- Method:
-
Local Degree Sparsifier
- Method:
local_degree_sparsifier(G, target_ratio) - Sparsifies the graph based on local node degrees, keeping a target ratio of edges.
- Method:
-
Random Edge Sparsifier
- Method:
random_edge_sparsifier(G, target_ratio, seed=42) - Randomly removes edges to achieve a target sparsity ratio.
- Method:
-
Simmelian Backbone Sparsifier
- Method:
simmelian_sparsifier(G, max_rank=5) - Implements Simmelian backbone sparsification, focusing on strongly embedded edges.
- Method:
-
Disparity Filter
- Method:
disparity_filter(G, alpha=0.05) - Implements the disparity filter technique as described in Serrano et al. (2009) PNAS paper.
- Method:
-
Overlapping Trees
- Method:
overlapping_trees(G, num_trees=3) - Creates a reduced network by combining multiple spanning trees.
- Method:
-
K-Core Decomposition
- Method:
k_core_decomposition(G, k=None) - Implements k-core decomposition, recursively removing nodes with degree less than k.
- Method:
These filtering techniques can be applied to both weighted and unweighted graphs, providing a diverse set of approaches for network reduction and noise filtering. Each method has its own strengths and is suitable for different types of network structures and analysis goals.
To add a new filtering technique:
- Implement your filter in the
src/net_filtering/filter.pyfile. - Ensure your filter function takes a networkx Graph as input and returns a filtered Graph.
- Use the
bench_noise_filteringfunction to benchmark your new filter.