|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Toward Polynomial-Time Verification of Networks with Infinite State Spaces: An Automata-Theoretic Approach" |
| 4 | +subtitle: "The AalWiNes What-if Analysis Suite" |
| 5 | +authors: [schmid, srba] |
| 6 | +categories: [research, network, verification] |
| 7 | +image: assets/images/aalwines-screenshot.png |
| 8 | +tags: [] |
| 9 | +--- |
| 10 | + |
| 11 | +With the increasing scale of communication networks, |
| 12 | +failures (e.g. link failures) are becoming the norm rather than the |
| 13 | +exception. Given the critical role such networks play for our |
| 14 | +digital society, it is important to ensure a reliable and |
| 15 | +efficient operation of such networks, even in the presence |
| 16 | +of one or multiple failures. |
| 17 | +While several interesting automated approaches to |
| 18 | +verify and operate networks are emerging, offering an attractive |
| 19 | +alternative to today’s pragmatic and manual “fix it when it breaks” approach, existing solutions often only provide a limited and inefficient |
| 20 | +support for reasoning about failure |
| 21 | +scenarios. In particular, verifying networks is a complex task, even for computers. For example, many existing automated |
| 22 | +solutions require to test each possible failure scenario (respectively, each possible dataplane) individually, resulting in a runtime which can be exponential in the number of failures; which makes this approach impractical for large-scale communication networks. In fact, certain communication networks, such as networks based on MultiPath Label Switching (MPLS) or Segment Routing (SR), may even have a theoretically infinite configuration space: in MPLS and SR networks, default routes as well as failover routes depend on the label stack in the packet header, which may grow and shrink arbitrarily along the route. |
| 23 | + |
| 24 | +In this article, we argue that automata-theoretic approaches can offer an interesting solution to these problems, allowing to perform fast verification, accounting even for failure scenarios and dealing with potentially infinite state spaces. |
| 25 | + |
| 26 | +## A case for verifying network configurations |
| 27 | + |
| 28 | +Before delving into the details, we first argue that it is often attractive to verify the |
| 29 | +network configuration directly: compared to verifying |
| 30 | +control plane protocols, analyzing the dataplane |
| 31 | +configuration (e.g., forwarding and conditional |
| 32 | +failover tables) has the advantage that it |
| 33 | +also accounts for possible external changes, |
| 34 | +e.g., manual changes through the CLI, bugs, or |
| 35 | +changes made by |
| 36 | +additional protocols. Moreover, dataplane-based approaches can also reveal |
| 37 | +problems in the network operation introduced by the algorithms |
| 38 | +that generate the routing tables from the control plane specifications. |
| 39 | + |
| 40 | +Let us consider the verification of forwarding rules in MPLS routing tables as an example. |
| 41 | +In a nutshell, in MPLS, packet labels can be nested |
| 42 | +in order to provide tunneling through the network or to handle link |
| 43 | +failures by the fast-reroute mechanism. This mechanism relies on pushing a new MPLS |
| 44 | +label on top of the label stack, to redirect the flow |
| 45 | +to go around the failed link. The mechanism can be applied several times |
| 46 | +in case of multiple link failures, creating larger and larger (in theory: unbounded) |
| 47 | +numbers of MPLS labels on the label-stack. |
| 48 | + |
| 49 | +A fairly general query language which supports to reason about network behavior under failures could be based on regular expressions: |
| 50 | + |
| 51 | +$ < a > b < c > k $ |
| 52 | + |
| 53 | +Here $a$ and $c$ are regular expressions over MPLS labels that describe the |
| 54 | +set of allowed initial resp. final headers of packets in the |
| 55 | +trace, $b$ is a regular expression over the links in the network, defining the set |
| 56 | +of allowed routing traces through the network, and $k$ is a number |
| 57 | +specifying the maximum number of failed links to be accounted for. |
| 58 | + |
| 59 | + |
| 60 | +By using regular expressions, we can test properties such as waypoint enforcement (e.g., is the |
| 61 | +traffic always forwarded through an intrusion detection system) |
| 62 | +or avoidance of certain routers in selected countries. |
| 63 | + |
| 64 | +## How does polynomial-time what-if analysis work? |
| 65 | + |
| 66 | +In order to evaluate such queries and conduct what-if analysis quickly, we make the case for a model-checking algorithm |
| 67 | +which is based on automata theory. Our approach is motivated by a natural (but not well-known) connection to the theory |
| 68 | +of pushdown automata and prefix rewriting systems. |
| 69 | +In a nutshell, given the network configuration, the routing tables, |
| 70 | +as well as the query, a pushdown automaton (PDA) can be constructed. |
| 71 | +On this PDA, we can then perform a standard reachability analysis. In more details, |
| 72 | +the regular expressions for the initial header and the final |
| 73 | +header of the query can each be converted to first a |
| 74 | +Nondeterministic Finite Automaton (NFA) and then to a Pushdown |
| 75 | +Automaton (PDA). The path query is converted to an NFA, which can then be used to restrict (by synchronous product) |
| 76 | +the behavior of the PDA constructed based on the network model. |
| 77 | +The three PDAs can be combined into a single PDA that can be given |
| 78 | +to the backend engine that checks the emptiness of the PDA language. |
| 79 | +In case the language is nonempty, we can return a witness trace |
| 80 | +to demonstrate the reason why the query is satisfied. All these operations can be performed in polynomial time, |
| 81 | +leveraging a classic result by J. Büchi. |
| 82 | + |
| 83 | +## The AalWiNes Tool and Example |
| 84 | + |
| 85 | +As a case study and to explore the feasibility of such an automata-theoretic approach, |
| 86 | +we developed a framework, the AalWiNes suite, which allows to load an arbitrary network topology |
| 87 | +(e.g., from the topology zoo dataset) as well as a set of router configurations (e.g., from Juniper routers). |
| 88 | +The configurations are put into a standardized, intermediate format which is vendor independent. |
| 89 | +The user can then define a query as described above, which leads the tool to prompt with a result |
| 90 | +(e.g. a trace). AalWiNes is a joint work at Aalborg University and the University of Vienna. |
| 91 | + |
| 92 | +For MPLS specifically, we have devloped the AalWiNes tool suite, implemented in C++, and improving upon an earlier |
| 93 | +prototype implementation in Python [[1]](#references). |
| 94 | +The tool allows to verify a wide range of important network properties in polynomial time, |
| 95 | +parameterized by the $k$. |
| 96 | + |
| 97 | +Here is a quick intro to AalWines, using an interative |
| 98 | +web-browser integration of our tool available at [demo.aalwines.cs.aau.dk](https://demo.aalwines.cs.aau.dk/): |
| 99 | + |
| 100 | +[](http://www.youtube.com/watch?v=mvXAn9i7_Q0 "AalWiNes Quick Intro") |
| 101 | + |
| 102 | +AalWiNes also allows to account for more complex traffic engineering aspects, such as load-balancing, |
| 103 | +by supporting nondeterminism, as well as more complex multi-operation chains. |
| 104 | +The tool includes several optimizations to further improve the performance, such as “top of stack reduction”, |
| 105 | +which safely calculates which labels can be at the top of stack in a given state of the PDA: the top of stack |
| 106 | +reduction technique greatly reduces the amount of transitions in the PDA. |
| 107 | + |
| 108 | +We refer to our [tool website](https://aalwines.cs.aau.dk/) as well as to our |
| 109 | +[project website](https://github.com/DEIS-Tools/AalWiNes). |
| 110 | + |
| 111 | +## Conclusion and future work |
| 112 | + |
| 113 | +Our aim is to provide a low runtime which is critical for the success of network |
| 114 | +verification, however, reasoning about failures seems particularly challenging with respect to the computational complexity. |
| 115 | +While not every network may allow for a polynomial-time analysis, |
| 116 | +we see much potential in automata-theoretic approaches, |
| 117 | +and we hope this article can inspire the community to conduct more research |
| 118 | +in this area. |
| 119 | + |
| 120 | +We understand our approach as a first step and believe that it opens the door to |
| 121 | +verify many other infinite state systems whose configurations |
| 122 | +cannot explicitly be represented anymore. |
| 123 | +For example, we have recently extended AalWiNes to support a witness trace generation which |
| 124 | +accounts for additional metrics; for example, the tool can now select (among a possibly infinite number of witness traces) |
| 125 | +the ones that minimize the number of hops, latency, or the stack height that corresponds to the number |
| 126 | +of MPLS tunnels. This is |
| 127 | +achieved by extending the PDA reachability analysis with multi-dimensional weights. |
| 128 | + |
| 129 | +Another interesting avenue for future research is |
| 130 | +the study of similar approaches for control plane verification |
| 131 | +or for supporting capacity planning, also taking into account |
| 132 | +resource constraints on failover paths. |
| 133 | + |
| 134 | +## References |
| 135 | + |
| 136 | +\[1\] : P-Rex: Fast Verification of MPLS Networks with Multiple Link Failures. |
| 137 | +Jesper Stenbjerg Jensen, Troels Beck Krogh, Jonas Sand Madsen, Stefan Schmid, Jiri Srba, |
| 138 | +and Marc Tom Thorgersen. 14th ACM International Conference on emerging Networking EXperiments |
| 139 | +and Technologies (CoNEXT), Heraklion/Crete, Greece, December 2018. |
0 commit comments