-
Notifications
You must be signed in to change notification settings - Fork 912
Produce fewer instructions in gen_and() and gen_or(). #1573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce fewer instructions in gen_and() and gen_or(). #1573
Conversation
|
One cause of the CI failures is a switch on an enum, another is a missing skip condition in a number of test blocks, both are known and trivial to address. Yet another cause is now bug report #1574, which blocks this pull request. |
3d61978 to
97b73f1
Compare
97b73f1 to
0eb8189
Compare
|
This revision contains a workaround to process IPv4 before IPv6, let's see if this is the only thing required to pass all tests. |
0eb8189 to
dd8adad
Compare
|
This rebased revision passes all tests, one commit has been merged into the master branch. This is going to be merged in a few days. |
dd8adad to
35247ea
Compare
|
Rebased on the current master branch. I would like to try making the messy diff commit look better before merging. |
35247ea to
4298c66
Compare
|
This revision has the previously messy diff readable and incorporates a few cosmetic improvements. It looks ready to me and is going to be merged tomorrow. |
The existing tests that mix "ip host IPV4ADDR" and "host IPV4ADDR" etc. on DLT_RAW and declare the same optimized filter program do not cover the problem space well enough. One problem with that is such blocks cannot declare an unoptimized filter program because in the unoptimized space the proto-qualified form does not contain any ARP/RARP branches, but proto-unqualified form contains them as unreachable instructions. Another problem is that this way it is impossible to verify that the proto-unqualified form matches ARP and ARP under right conditions. Thus move the proto-unqualified aliases to their own test blocks and use a DLT that supports all the implied protocols, and many others -- this way the test proves that the implicit set of protocols extends as far as it is supposed to extend, and no further. Limit the new blocks to lowercase hostnames only, and repeat a quarter (with a fixed dir qualifier) of the new blocks for DLT_RAW to cover the elimination of ARP and RARP in greater detail for the next commits. This adds the following accept test blocks: * net_ipv4addr_16bit_IP_OVER_FC (from ip_net_addr) * src_net_ipv4addr_24bit_IP_OVER_FC (from ip_src_net_addr) * dst_net_ipv4addr_8bit_IP_OVER_FC (from ip_dst_net_addr_8) * src_and_dst_net_ipv4addr_12bit_IP_OVER_FC (new) * src_and_dst_net_ipv4addr_12bit_RAW (new) * net_ipv6addr_IP_OVER_FC (from ip6_net) * src_net_ipv6addr_IP_OVER_FC (from ip6_src_net) * dst_net_ipv6addr_IP_OVER_FC (new) * src_and_dst_net_ipv6addr_IP_OVER_FC (new) * src_and_dst_net_ipv6addr_RAW (new) * net_name_EN10MB (new) * src_net_name_EN10MB (from ip_src_net_name and ip_src_net_NAME) * dst_net_name_EN10MB (from ip_dst_net_name and ip_dst_net_NAME) * src_and_dst_net_name_EN10MB (new) * src_and_dst_net_name_RAW (new) * host_ipv4addr_FDDI (from ip_host_addr) * src_host_ipv4addr_FDDI (from ip_src_host_addr) * dst_host_ipv4addr_FDDI (from ip_dst_host_addr) * src_and_dst_host_ipv4addr_FDDI (new) * src_and_dst_host_ipv4addr_RAW (new) * host_ipv6addr_IEEE802_11 (from ip6_host_addr) * src_host_ipv6addr_IEEE802_11 (from ip6_src_host_addr) * dst_host_ipv6addr_IEEE802_11 (from ip6_dst_host_addr) * src_and_dst_host_ipv6addr_IEEE802_11 (new) * src_and_dst_host_ipv6addr_RAW (new) * host_noeth_ipv4_noipv6_EN10MB (from ip_host_name/ip_host_NAME) * host_noeth_noipv4_ipv6_EN10MB (from ip6_host_name/ip6_host_NAME) * host_noeth_ipv4_ipv6_EN10MB (new) * src_host_noeth_ipv4_noipv6_EN10MB (from ip_src_host_name and ip_src_host_NAME) * src_host_noeth_noipv4_ipv6_EN10MB (from ip6_src_host_name and ip6_src_host_NAME) * src_host_noeth_ipv4_ipv6_EN10MB (new) * dst_host_noeth_ipv4_noipv6_EN10MB (from ip_dst_host_name and ip_dst_host_NAME) * dst_host_noeth_noipv4_ipv6_EN10MB (from ip6_dst_host_name and ip6_dst_host_NAME) * dst_host_noeth_ipv4_ipv6_EN10MB (new) * src_and_dst_host_noeth_ipv4_noipv6_EN10MB (new) * src_and_dst_host_noeth_noipv4_ipv6_EN10MB (new) * src_and_dst_host_noeth_ipv4_ipv6_EN10MB (new) * src_and_dst_host_noeth_ipv4_noipv6_RAW (new) * src_and_dst_host_noeth_noipv4_ipv6_RAW (new) * src_and_dst_host_noeth_ipv4_ipv6_RAW (new)
In the accept block structure make sure both "opt" and "unopt" are always _declared_ (i.e. Perl hash keys present), and introduce "optunopt" as a shorthand to _define_ (i.e. to set to something other than Perl undef) both values at once. Always require "unopt" to be _defined_ because that's the least of what can be expected from an accept test; allow "opt" to be _declared_ as undef because several accept tests cannot run optimized. Require all _defined_ results to be non-empty; require "opt" and "unopt" not to be equal if each has been defined explicitly rather than expanded from "optunopt". Convert most (505) accept blocks to define "optunopt", in the other 145 blocks define both "opt" and "unopt", in the remaining 11 blocks define "unopt" and specify "opt" as undef.
Also always use the result, except a few times for gen_not(). This rewrites a lot of invocations, but preserves the logic, so there are no changes to the tests.
In the current implementation gen_true() produces "ld #0x0; jeq #0x0" and gen_false() produces "ld #0x1; jeq #0x0". When such a block becomes an argument of gen_and() or gen_or(), it is not quite trivial to detect the original Boolean constant behind it, so instead of potentially reducing the logical AND/OR to exactly one of its arguments the functions always combine both blocks in a generic manner. This accounts to at least a fraction of dead branches in some unoptimized filter programs. Subsequently the optimizer (if it is available for the DLT and enabled by the user) eliminates many (but not all, see GH the-tcpdump-group#1022) of such no-op instructions, but it would be better not to produce them in the first place. As the earlier commits 8620df1, 8788f68 and 6ffae89 show, such reductions of a few ANDs indeed have the desired effect, but depend on specialized code outwith gen_and(). To cover all such use cases using simpler code, generalize the solution. Add another field to struct block to convey the Boolean meaning of a block (definitely true, definitely false, something else), and process that in gen_uncond(), gen_and(), gen_or() and gen_not(). This conflicts with the old style of taking the result of gen_and() and gen_or() from the second argument, so apply PCAP_WARN_UNUSED_RESULT and add comments. As the affected tests show, this has varying effect on the number of instructions: DLT_RAW "host IPV4ADDR" (-12 unopt.) DLT_RAW "net IPV4ADDR" (-16 unopt.) DLT_RAW "net NAME" (-12 unopt.) DLT_RAW "(arp|rarp) host IPV4ADDR" (-4 unopt.) DLT_EN10MB "vxlan [NUM]" (-10 unopt.) DLT_EN10MB "vxlan and vxlan" (-7 opt. and -20 unopt.) DLT_EN10MB "geneve [NUM]" (-21 opt. and -21 unopt.) DLT_EN10MB "geneve and geneve" (-42 opt. and -42 unopt.) DLT_IPV4 "ip multicast" (-1 opt. and -2 unopt.) DLT_IPV4 "ip broadcast" (-1 opt. and -2 unopt.) DLT_IPV6 "ip6 multicast" (-1 opt. and -2 unopt.) DLT_IPV6 "icmp6[NUMBER] != NUMBER" (-1 opt. and -1 unopt.) There is no change in any "host NAME" accept blocks: the conditional q.proto expansion in gen_scode() case Q_HOST eliminates both ARP and RARP already.
The only effect of the conditional override that expands Q_DEFAULT to Q_IP and Q_IPV6 (but not Q_ARP or Q_RARP) used to be absence of the ARP and RARP dead branches in the unoptimized filter program for a number of DLTs. Since this is now a particular case of the generic AND/OR reduction, remove the custom code -- this produces exactly the same filter programs, so there is no change in any tests.
This code now is a particular case of what gen_and() does, so there is no point in keeping it. There is no change in any tests because the expected optimizations hold as before.
4298c66 to
730a8db
Compare
|
Had to fix a commit message, this is going to be merged soon after the CI has passed. |
|
As it turns out, |
|
The regression concerns GENEVE and VXLAN only. My working copy has a bug fix, it needs better comments and better tests before committing. |
|
Also there is another bug, for which there is a separate work-in-progress bug fix. |
Recently I attempted adding IPv6 support to the
gatewayprimitive, which did not work in the expected trivial manner, but instead produced a number of useful by-products. One of those is bug report #1571, another is this series of commits. It includes a more reliable revision of the commit I earlier removed from pull request #1498 (Have gen_and(), gen_or() and gen_not() return the result.), which in turn enables an early optimization of logical AND and OR as discussed in bug report #1022 — the optimization reduces that problem space a little bit. Another optimization (Do not special-case BPF_JEQ in gen_relation_internal().) addresses a different root cause, but produces comparable results.One of the main complexities here was to compose the tests right to make sure they cover the affected code paths, but at the same time do not inflate the combinatoric space too much. Another was to organize the changes into reasonably focused commits, which after many rounds of clean-ups looks about right. One exception is the 40 new test blocks, which look very messy in the diff, even though they mostly add new Perl code.