Skip to content

SASL Support#245

Open
gene-redpanda wants to merge 6 commits intomainfrom
test-pr-138-fixes
Open

SASL Support#245
gene-redpanda wants to merge 6 commits intomainfrom
test-pr-138-fixes

Conversation

@gene-redpanda
Copy link
Member

No description provided.

@gene-redpanda gene-redpanda added the ci-ready indicates that a PR is ready for builds to run label Feb 2, 2026
@gene-redpanda gene-redpanda force-pushed the test-pr-138-fixes branch 3 times, most recently from 1638cf0 to dd19144 Compare February 5, 2026 14:11
@gene-redpanda gene-redpanda changed the title test: point to pr/138-fixes branch for SASL testing SASL Support Feb 17, 2026
@gene-redpanda gene-redpanda marked this pull request as ready for review February 17, 2026 20:57
Comment on lines +4 to +6
type: git
source: https://github.com/redpanda-data/redpanda-ansible-collection.git
version: pr/138-fixes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed after PR 138 is merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, this should PR should be merged after the ansible PR related to the above branch is merged. Then this should be removed before being merged.

Copy link
Member

@vuldin vuldin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: SASL Support

1. requirements.yml pinned to ephemeral branch — BLOCKER

The commit message itself says "temp: point to ansible collection git branch". This changes the redpanda.cluster source from Galaxy to git branch pr/138-fixes:

- name: redpanda.cluster
  type: git
  source: https://github.com/redpanda-data/redpanda-ansible-collection.git
  version: pr/138-fixes

This must not merge to main. If that branch is deleted or force-pushed, all CI breaks. Either:

  • Wait for the upstream PR to merge and a Galaxy release, then revert to type: galaxy with a version pin.
  • Or pin to a specific commit SHA as an interim measure.

The git SHA logging added to .tasks/ansible.yml is useful for debugging this temp state but should probably also be removed before merge.


2. Unquoted password in Taskfile command lines

In .tasks/cluster.yml, the password is passed bare:

--extra-vars sasl_superuser_password={{.REDPANDA_SASL_PASSWORD}}

If the password contains spaces, $, backticks, !, etc., this breaks or behaves unexpectedly. Should be quoted. Also, the password will appear in ps output and CI logs. The sasl:idempotent variant compounds this by also setting redpanda_broker_no_log=false while passing the password on the same command line, and the check-idempotency.sh invocation repeats the full command (so logged twice).

Consider quoting at minimum:

--extra-vars 'sasl_superuser_password={{.REDPANDA_SASL_PASSWORD}}'

3. Concurrency group mismatch for aws fedora sasl

aws fedora sasl uses concurrency_group: aws-fd-cn, which is the connect group. Meanwhile aws ubuntu sasl uses concurrency_group: aws-sl. This is asymmetric — fedora SASL will queue behind connect jobs and vice versa. Was this intentional for resource management, or should it use aws-sl (or a new group)?


4. Unstable tiered storage coverage removed

All four unstable tiered storage jobs are replaced by SASL jobs:

  • unstable aws fedora tiered / tiered large (including is4gen.4xlarge ARM64)
  • unstable aws ubuntu tiered / tiered large

Stable tiered storage jobs still exist, but unstable (pre-release) tiered storage testing — especially the ARM64 large-instance variant — is completely gone. If unstable tiered regressions are a concern, this is a coverage gap.


5. REDPANDA_SASL_PASSWORD not in Buildkite Docker environment

None of the SASL pipeline jobs pass REDPANDA_SASL_PASSWORD through the Docker environment. This works because Taskfile.yml defaults it to cicd-sasl-password, and the test scripts also default to the same value. But if you ever want to pull the password from the secrets manager instead of using a hardcoded default, it would need to be added to the environment lists.


6. Minor: Inconsistent no_log on license task

In operation-apply-license.yml, the "Set Redpanda license (string)" task has unconditional no_log: true, while every other task uses no_log: "{{ kafka_enable_authorization | default(false) }}". The license string is sensitive so this may be intentional, but it means non-SASL license failures produce zero debug output. A comment explaining the intent would help.


Non-blocking observations

  • Password in process list: rpk_opts interpolates sasl_superuser_password directly into shell commands. no_log prevents Ansible console output but not /proc/<pid>/cmdline or ps visibility on target hosts. A longer-term improvement would be rpk profiles or env vars (RPK_USER/RPK_PASS). Fine for now.
  • Hardcoded defaults in manage-sasl-users.yml: producer-secret and consumer-secret will be silently used if env vars aren't set. This is a template/example playbook so probably fine, but worth a comment in the file.
  • Test scripts leak password in CI logs: The test:cluster:sasl and test:schema:sasl tasks run rpk/curl with the password in shell commands that get logged. Acceptable for CI with a throwaway password, but worth knowing.

Copy link
Member

@vuldin vuldin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed this PR and it looks to contain all the functionality from #243 but without the service accounts being created (which I believe is the right approach since it will rely on the ephemeral users instead). I know @rockwotj mentioned knowing of future changes to how this works, but I assume any incoming changes won't break this approach.

Comment on lines +4 to +6
type: git
source: https://github.com/redpanda-data/redpanda-ansible-collection.git
version: pr/138-fixes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, this should PR should be merged after the ansible PR related to the above branch is merged. Then this should be removed before being merged.

@gene-redpanda
Copy link
Member Author

@vuldin I'm going to leave 4 as is (removed TS jobs) as that should be covered by other tests and reduces the overall spend in time/cash to run the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-ready indicates that a PR is ready for builds to run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants