Conversation
1638cf0 to
dd19144
Compare
48a001d to
f1a8825
Compare
f1a8825 to
42359a6
Compare
| type: git | ||
| source: https://github.com/redpanda-data/redpanda-ansible-collection.git | ||
| version: pr/138-fixes |
There was a problem hiding this comment.
Should this be removed after PR 138 is merged?
There was a problem hiding this comment.
Great catch, this should PR should be merged after the ansible PR related to the above branch is merged. Then this should be removed before being merged.
vuldin
left a comment
There was a problem hiding this comment.
PR Review: SASL Support
1. requirements.yml pinned to ephemeral branch — BLOCKER
The commit message itself says "temp: point to ansible collection git branch". This changes the redpanda.cluster source from Galaxy to git branch pr/138-fixes:
- name: redpanda.cluster
type: git
source: https://github.com/redpanda-data/redpanda-ansible-collection.git
version: pr/138-fixesThis must not merge to main. If that branch is deleted or force-pushed, all CI breaks. Either:
- Wait for the upstream PR to merge and a Galaxy release, then revert to
type: galaxywith a version pin. - Or pin to a specific commit SHA as an interim measure.
The git SHA logging added to .tasks/ansible.yml is useful for debugging this temp state but should probably also be removed before merge.
2. Unquoted password in Taskfile command lines
In .tasks/cluster.yml, the password is passed bare:
--extra-vars sasl_superuser_password={{.REDPANDA_SASL_PASSWORD}}
If the password contains spaces, $, backticks, !, etc., this breaks or behaves unexpectedly. Should be quoted. Also, the password will appear in ps output and CI logs. The sasl:idempotent variant compounds this by also setting redpanda_broker_no_log=false while passing the password on the same command line, and the check-idempotency.sh invocation repeats the full command (so logged twice).
Consider quoting at minimum:
--extra-vars 'sasl_superuser_password={{.REDPANDA_SASL_PASSWORD}}'
3. Concurrency group mismatch for aws fedora sasl
aws fedora sasl uses concurrency_group: aws-fd-cn, which is the connect group. Meanwhile aws ubuntu sasl uses concurrency_group: aws-sl. This is asymmetric — fedora SASL will queue behind connect jobs and vice versa. Was this intentional for resource management, or should it use aws-sl (or a new group)?
4. Unstable tiered storage coverage removed
All four unstable tiered storage jobs are replaced by SASL jobs:
unstable aws fedora tiered/tiered large(includingis4gen.4xlargeARM64)unstable aws ubuntu tiered/tiered large
Stable tiered storage jobs still exist, but unstable (pre-release) tiered storage testing — especially the ARM64 large-instance variant — is completely gone. If unstable tiered regressions are a concern, this is a coverage gap.
5. REDPANDA_SASL_PASSWORD not in Buildkite Docker environment
None of the SASL pipeline jobs pass REDPANDA_SASL_PASSWORD through the Docker environment. This works because Taskfile.yml defaults it to cicd-sasl-password, and the test scripts also default to the same value. But if you ever want to pull the password from the secrets manager instead of using a hardcoded default, it would need to be added to the environment lists.
6. Minor: Inconsistent no_log on license task
In operation-apply-license.yml, the "Set Redpanda license (string)" task has unconditional no_log: true, while every other task uses no_log: "{{ kafka_enable_authorization | default(false) }}". The license string is sensitive so this may be intentional, but it means non-SASL license failures produce zero debug output. A comment explaining the intent would help.
Non-blocking observations
- Password in process list:
rpk_optsinterpolatessasl_superuser_passworddirectly into shell commands.no_logprevents Ansible console output but not/proc/<pid>/cmdlineorpsvisibility on target hosts. A longer-term improvement would be rpk profiles or env vars (RPK_USER/RPK_PASS). Fine for now. - Hardcoded defaults in
manage-sasl-users.yml:producer-secretandconsumer-secretwill be silently used if env vars aren't set. This is a template/example playbook so probably fine, but worth a comment in the file. - Test scripts leak password in CI logs: The
test:cluster:saslandtest:schema:sasltasks runrpk/curlwith the password in shell commands that get logged. Acceptable for CI with a throwaway password, but worth knowing.
vuldin
left a comment
There was a problem hiding this comment.
I just reviewed this PR and it looks to contain all the functionality from #243 but without the service accounts being created (which I believe is the right approach since it will rely on the ephemeral users instead). I know @rockwotj mentioned knowing of future changes to how this works, but I assume any incoming changes won't break this approach.
| type: git | ||
| source: https://github.com/redpanda-data/redpanda-ansible-collection.git | ||
| version: pr/138-fixes |
There was a problem hiding this comment.
Great catch, this should PR should be merged after the ansible PR related to the above branch is merged. Then this should be removed before being merged.
|
@vuldin I'm going to leave 4 as is (removed TS jobs) as that should be covered by other tests and reduces the overall spend in time/cash to run the CI. |
No description provided.