Programs related to the field of Graph Analysis are expessed here. More on Graph Analysis is at the following link: https://www.pinterest.com/HamedShahHosseini/graph-analysis/
Graph theory is said to be originated by the work of Leonard Euler when dealing by the Konigsberg bridges problem. He abstracted the lands (four) by nodes and the bridges (seven) by edges.
- However, Islamic Scholars (9th–14th centuries) used tree diagrams for lineage and knowledge organization.
The Konigsberg bridges problem asks to start at any land areas of the city, walk over each of the seven bridges exactly once, and return to the starting point. Euler proved that there is no solution for this problem.
- What is a graph? A graph is a mathematical structure to represent relationships between objects. Objects are nodes (vertices), and relationships are edges (links). Here, we review the graph definition, and discuss undirected graphs versus directed graphs with examples in Python.
- Graph: Adjacency matrix For a graph with n nodes, the adjacency matrix is of size n-by-n. For simple graphs, the entries of the adjacency matrix is composed of zeros and ones. For undirected graphs, this matrix is symmetric. Here, we give the Python code for defining the adjacency matrix for different types of graphs.
- Graph: Node degree The degree of a node in graph is a simple concept which is quite useful in analyzing and understanding the graph. The degree of a node is the number of edges connected to the node. The definition of node degree for directed graphs includes in-degrees and out-degrees. We have some interesting formulae regarding the node degrees. For example, the degree sum formula (handshaking lemma) states that the sum of degrees of nodes in a graph is twice the number of the edges of the graph. Here, we review the concept of degrees in graphs by examples in Python.
- Adjacency list: The adjacency list is another way to represent graphs. It is a data structure holding a list of nodes of a graph such that each node in the list stores its neighbors. Adjacency lists are particularly efficient for sparse graphs such as social networks, web graphs, and biological networks. Here, we implement a Python class for a graph which holds the adjacency list of the graph in a defaultdict. We build two graphs by our class and show their adjacency lists. As a bonus, a Python function has also been defined to convert our graph to that of NetworkX.
- Walk, trail, path, cycle (circuit), and Eulerian trail/circuit: The concepts that define the sequences of nodes and edges within a graph are reviewed here. The conditions for a graph to have an Eulerian trail or circuit (cycle) are also mentioned. The Python code of this NoteBook contains a small graph analyzer implemented from scratch for walks, trails, paths, cycles, and Eulerian trail or circuit.
- Subgraph: A subgraph is a graph formed from a subset of the nodes and edges of a larger graph. A subgraph preserves the connections from the original graph, but may contain fewer nodes and edges. Here, we review the concept of subgraph and some of its types. Also, we provide a Python function from scratch to check if a graph is a subgraph of another graph. We also use some functions available by NetworkX too, for subgraphs.
- Clique: A clique is subset of nodes in a graph, which for its every two distinct nodes, there is an edge in the graph. In other words, a clique forms a complete subgraph. In this Notebook, we review and implement the Bron-Kerbosch algorithm from scratch to find all maximal cliques in a given undirected graph. Moreover, we use the function find_cliques in NetworkX to compare with our results.
- Trees: A tree is an undirected graph which is connected and acyclic. Or you may say a tree is an undirected connected graph with n nodes which has n-1 edges. Here, we review trees in graph theory and express different kinds of them. Then, we bring some Python code to build trees and do some operations on them. As a bonus, we also create a simple decision tree for medical diagnosis.
- Depth-first search: A depth-first search (DFS) is a search algorithm that is used to traverse graphs or trees. A DFS can be implemented in recursive way or iterative way. Here, we implement both the recursive and iterative versions, and test them by a given graph. We also implement a DFS that finds a path between two nodes, and extend it to find all paths between two nodes.
- Topological sorting (ordering) When we have a DAG (directed acyclic graph), we can form a linear order of nodes such that for every directed edge (u → v), node u comes before node v in the ordering. A DAG is a directed graph that has no cycles. Topological ordering is only possible for DAGs. Here, we give two methods to create such orderings for DAGs: a depth-first search (DFS) method, and Kahn's algorithm. We bring both methods in Python codes. One feature of Kahn's algorthm is that it can report cycles in the given directed graph if such cycles exist. As a bonus, we have also expressed a DFS-based method for checking if a directed graph is a DAG or not.
- Breadth-first search: Breadth-first search (BFS) is another uninformed search algorithm, which is used for traversing graphs (or trees). BFS explores all nodes at the current depth level before moving on to nodes at the next depth level. It employs a queue to keep the order of exploration. Here, we implement a BFS in Python code and use it for exploring nodes of a graph. We also implement the BFS for finding the shortest path from a start node to an end node. As a bonus, we use a BFS to compute the degrees-of-separation between two persons on a small social network.
- Dijkstra's algorithm: This algorithm finds the shortest paths from a single source node to all other nodes in a weighted graph with non-negative edge weights. It uses a greedy approach, repeatedly visiting the closest unvisited node and updating distances to its neighbors. The algorithm guarantees optimal shortest paths and completes in O((|V|+|E|) log|V|) time with a priority queue (for graph G=(V,E)). Here, we implement Dijkstra's algorithm from scratch in Python with the use of a priority queue. The priority queue is created by a min-heap. A simple graph is given to the algorithm to test it. Also, as a bonus, we use the Dijkstra's algorithm to find the strongest friendship path between two users in a social network.
- A* algorithm: A* algorithm is another informed search algorithm that finds the shortest path from a
startnode to agoalnode. It may be viewed as an extention to Dijkstra's algorithm. A* algorithm uses the evaluation functionf(n)=g(n)+h(n), whereg(n)is the actual distance from the start node to current noden, andh(n)is the heuristic which estimates the distance from the current nodento thegoalnode. If the heuristic functionh(.)never overestimates the actual distance to goal, then the A* algorithm will find the optimal path. Such a heuristic is called admissible. Here, we implement A* algorithm from scratch in Python, and test it with a simple grid world. - Greedy best-first search: The GBFS is another informed search algorithm that relies solely on a heuristic function, say
h(.), to explore nodes. Thus, the evaluation function will bef(n)=h(n). The GBFS is a fast algorithm that may not find the optimal path. The GBFS is implemented here from scratch in Python, and then is tested by a grid graph having some obstacles. - Prim's algorithm for MST: Prim's algorithm is used to find a minimum spanning tree (MST) of a weighted undirected graph. A graph may have more than one MST. Prim's algorithm is a node-based MST algorithm. It means that Prim's algorithm starts from a random node and grows it as a tree by always adding cheapest edge that connects a node inside the tree to a node outside the tree. Here, we implement Prim's algorithm from scratch in Python using a binary min-heap as a priority queue. We then apply the algorithm to a simple graph. The original graph and its MST are shown by NetworkX.
- Kruskal's algorithm for MST: Kruskal's algorithm is an edge-based algorithm for obtaining the minimum spanning tree (MST) of a weighted undirected graph. In this algorithm, we first sort edges based on their weight. Then, we add edges one by one until we get
n-1edges for a graph withnnodes such that none of the edges creates cycle. Here, we give the Python code of Kruskal's algorithm from scratch using union-find data structure. Then, we use two example graphs to test the algorithm. - Label propagation: Label propagation is a community detection algorithm. Label propagation discovers clusters (communities) in a graph or network by having nodes adopt the most frequent label among their neighbors, repeating this process until labels stabilize. There is no need to specifiy the number of communities. This algorithm is fast and can work well on large-scale graphs (or networks). We implement label propagation from scratch in Python. Then, we use it with a simple manual graph, and with the Zachary's karate club network. It should be reminded that label propagation might report different communities in different runs; because it is based on randomness, communities sometimes overlap, and networks may have ambiguous nodes that could belong to more than one community.
- What is modularity? Modularity measures how well a network divides into communities by comparing actual connections to random expectations. Its range is from: -0.5 to 1.0. High values indicates strong community structure while zero means it is no better than random. In this notebook, we implement the function to compute modularity from adjacency matrix. We also give a function to compute modularity from adjacency list. Some graphs along different community assignments are given to the function to see how modularity reflects community partitions in numerical values.
- Degree centrality: This the the simplest and straightforward measure for node's importance. Degree centrality counts the number of connections of a node in an unweighted graph (which is the node degree). In a weighted graph, degree centrality of a node is the sum of the edge weights connected to the node. It is reminded that we talked about node degree in an earleir post. Here, we bring the Python code of degree centrality for both unweighted and weighted graphs. We also mention the normalization of degree centrallity. A few examples accompany the Notebook file.
- Betweenness centrality: Betweenness centrality measures how often a node lies on the shortest paths between other nodes in a network. A node with high betweenness means that it acts as bridges or bottlenecks that control information flow. Here, we implement functions to compute betweenness centrality for unweighted graphs and weighted graphs. We test these two functions by some examples. It is reminded that the function to compute betweeness for unweighted graphs uses a breadth-first search (BFS), while for weighted graphs, it uses Dijkstra's algorithm.
- Closeness centrality: Another metric for centrality is the closeness centrality in which we measure how close a node is to all other nodes in a network or graph. It calculates the reciprocal of the sum of shortest path distances from a node to all others. Nodes with high closeness can quickly interact with or reach the rest of the network, making them centrally positioned for efficient communication or influence spread. Here, we implement the functions to compute closeness centrality for both unweighted and weighted graphs. A few examples are also included. It is noted that for disconnected graphs, the harmonic closeness is a better choice than the standard closeness.
- Eigenvector centrality: Eigenvector centrality measures a node's influence (importance) by assigning higher scores to nodes connected to other high-scoring nodes—capturing the idea that importance comes from being connected to important neighbors. Here, we implement power iteration algorithm for estimating the eigenvector centrality both for unweighted and weighted graphs in Python. We test the algorithm by a few examples of graphs.
- PageRank: PageRank is a link analysis algorithm. it was primarily proposed to score web pages in the World Wide Web (or any webgraph). Here, every outgoing link is like a vote that has a value proportional to the rank of the node divided by the number of its outgoing links. Then, a node is important if important nodes link to it. We express the PageRank algorithm from scratch in Python. Also, we test the algorithm with a few directed graphs.