WebGraphReader is a utility for reading .graph format files and extracting scenario and node files based on the expected input standards for KafkaDataSend / SendToFlinkPregel.
The tool provides two main extractors:
Extracts the edge list as a scenario file. Each entry contains:
Edge IDSource Node IDTarget Node IDEdge Weight
assignEdgeWeights = true: Randomly assigns edge weights betweenminBoundandmaxBound.removeLoops = true: Removes any self-loops (edges where source and target nodes are the same).
Extracts the node list as a scenario file. Each entry contains:
Node IDBlock IDNode Weight
assignNodeWeights = true: Randomly assigns node weights betweenminBoundandmaxBound.assignBlockID = true: AssignsBlock IDs based on the hostname in the URL. All URLs with the same hostname are assigned the same block.
A
blockstats.txtfile is also generated, summarizing the number of nodes per block.