Skip to content

[BUG] Performance Analyzer breaks commutativity of NetworkPlugin.getTransportInterceptors #840

@cwperks

Description

@cwperks

What is the bug?

When PA is running in Dual mode, any request to index a doc fails with the following exception:

curl -XPUT "https://admin:myStrongPassword123\!@localhost:9200/movies/_doc/3" -k \
  -H "Content-Type: application/json" \
  -d '{"hello":"world"}'
{"_index":"movies","_id":"3","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":1,"failures":[{"_index":"movies","_shard":0,"_node":"_QoDimt_S7KPc-7gqkfT5w","reason":{"type":"security_exception","reason":"Internal or shard requests not allowed from a non-server node for transport type RTFPerformanceAnalyzerTransportChannelType"},"status":"INTERNAL_SERVER_ERROR","primary":false}]},"_seq_no":0,"_primary_term":1}

Repro steps:

✗ curl -XPOST https://admin:myStrongPassword123\!@localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -k

{"currentPerformanceAnalyzerClusterState":0,"shardsPerCollection":0,"collectorsSetting":0,"batchMetricsRetentionPeriodMinutes":7}%

✗ curl -XPUT https://admin:myStrongPassword123\!@localhost:9200/_cluster/settings -k \
-H "Content-Type: application/json" \
-d '{"transient": {"cluster.metadata.perf_analyzer.collectors.mode":"2"}}'

{"acknowledged":true,"persistent":{},"transient":{"cluster":{"metadata":{"perf_analyzer":{"collectors":{"mode":"2"}}}}}}%

✗ curl -XPUT https://admin:myStrongPassword123\!@localhost:9200/_cluster/settings -k \
-H "Content-Type: application/json" \
-d '{"persistent": {"cluster.metadata.perf_analyzer.collectors.mode":"1"}}'

{"acknowledged":true,"persistent":{"cluster":{"metadata":{"perf_analyzer":{"collectors":{"mode":"1"}}}}},"transient":{}}%

Once the settings above are configured, try index a doc.

Additional details:

There already is tight coupling between the Security plugin and PA where security has logic to try to peel the inner channel inside the SecurityInterceptor if the channel has been wrapped.

Code refs:

The reason for the failure is that in DUAL mode, PA doubly wraps the transport channel. When Security unwraps the outer layer, it still encounters another wrapped channel and stops unwrapping because it assumes the channel is wrapped at most once.

What's changed?

In OpenSearch 3.0, AD has declared an optional extends relationship on the security plugin. This means that security will be loaded before AD and likely will be loaded before performance analyzer as well. In prior versions, PA was likely loaded before security.

On Node Bootstrap, the NetworkModule of core will iterate through the network plugins to create a composite transport interceptor which is composed of all transport interceptors registered with NetworkPlugin.getTransportInterceptors(). Given that there is no nortion of ordering for transport interceptors the assumption is that these interceptors are commutative meaning that order does not matter. PA breaks this assumption.

To fix the issue, PA needs to revisit the logic for wrapping the channel or introduce a mechanism to the core for ordering the transport interceptors if they are no longer supposed to be commutative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions