Skip to content

Conversation

@amathuria
Copy link
Contributor

Using --rerun without --ceph currently defaults to the main branch. This change sources the branch from the original run being rerun, making reruns more intuitive. Warns if --ceph is specified and does not match the original branch.

Comment on lines 195 to 201
if conf.ceph_branch is not None and conf.ceph_branch != ceph_branch:
log.warning('--ceph specified but does not match with '
'rerun --ceph branch: %s',
ceph_branch)
else:
log.info('Using rerun ceph branch=%s', ceph_branch)
conf.ceph_branch = ceph_branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is a bit tangled, also there is case when value of --ceph is overridden (though with the same value). Also I think you still want to always print the branch out, which is currently used, don't you?. So, maybe:

if conf.ceph_branch is None:
  conf.ceph_branch = ceph_branch
else:
   if conf.ceph_branch != ceph_branch:
     log.warning(f"--ceph specified but does not match the rerun job's branch {ceph_branch}")
log.info(f"Using ceph branch={conf.ceph_branch}")

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshtsk yes agreed! I had discussed this with Zack as well, will ping you once I address it :)

@amathuria amathuria force-pushed the wip-amat-fix-rerun-branch branch from ee4a9f6 to c08e30c Compare November 12, 2025 16:53
@amathuria amathuria marked this pull request as ready for review December 10, 2025 12:36
@amathuria amathuria requested a review from a team as a code owner December 10, 2025 12:36
@amathuria amathuria requested review from VallariAg, kamoltat and kshtsk and removed request for a team December 10, 2025 12:36
@amathuria
Copy link
Contributor Author

This is what a rerun would look like now:

(virtualenv) amathuri@teuthology:~/teuthology$ teuthology-suite --rerun skanta-2025-12-09_03:55:08-rados-wip-bharath10-testing-2025-12-08-1514-distro-default-smithi -m gibba --distro centos --distro-version 9 --priority 50 --dry-run
2025-12-10 12:24:26,134.134 INFO:teuthology.suite:Using rerun seed=4154
2025-12-10 12:24:26,134.134 INFO:teuthology.suite:Using rerun subset=(111, 120000)
2025-12-10 12:24:26,134.134 INFO:teuthology.suite:Using rerun no_nested_subset=False
2025-12-10 12:24:26,134.134 INFO:teuthology.suite:Using ceph branch=wip-bharath10-testing-2025-12-08-1514
2025-12-10 12:24:26,135.135 INFO:teuthology.suite:Using rerun ceph sha1=81b05d7a71071a84cda7f3c891fc17481ced63f7
2025-12-10 12:24:26,138.138 INFO:teuthology.suite.run:Checking for expiration (None)
2025-12-10 12:24:26,138.138 INFO:teuthology.suite.run:kernel sha1: distro
2025-12-10 12:24:26,553.553 INFO:teuthology.suite.run:ceph sha1 explicitly supplied
2025-12-10 12:24:26,553.553 INFO:teuthology.suite.run:ceph sha1: 81b05d7a71071a84cda7f3c891fc17481ced63f7
2025-12-10 12:24:26,818.818 INFO:teuthology.suite.run:ceph version: 20.3.0-4460.g81b05d7a
2025-12-10 12:24:26,938.938 INFO:teuthology.suite.run:ceph-ci branch: wip-bharath10-testing-2025-12-08-1514 81b05d7a71071a84cda7f3c891fc17481ced63f7
2025-12-10 12:24:26,941.941 INFO:teuthology.repo_utils:Cloning https://git.ceph.com/ceph-ci.git wip-bharath10-testing-2025-12-08-1514 from upstream
2025-12-10 12:25:36,858.858 INFO:teuthology.suite.run:teuthology branch: main 258eb6279f4d7fcd4b45c82e521f2a2e799d7f33
2025-12-10 12:25:36,902.902 INFO:teuthology.suite.build_matrix:Subset=111/120000
2025-12-10 12:26:46,033.033 INFO:teuthology.suite.run:Suite rados in /cephfs/home/amathuri/src/git.ceph.com_ceph-c_81b05d7a71071a84cda7f3c891fc17481ced63f7/qa/suites/rados generated 272 jobs (not yet filtered or merged)
2025-12-10 12:26:51,422.422 INFO:teuthology.suite.run:Scheduling rados/cephadm/workunits/{0-distro/ubuntu_22.04 agent/on mon_election/connectivity task/test_cephadm}
2025-12-10 12:26:51,422.422 INFO:teuthology.suite.run:Scheduling rados/basic/{ceph clusters/{fixed-2} mon_election/connectivity msgr-failures/many msgr/async-v1only objectstore/{bluestore/{alloc$/{bitmap} base mem$/{low} onode-segment$/{1M} write$/{v2/{compr$/{no$/{no}} v2}}}} rados supported-random-distro$/{centos_latest} tasks/rados_api_tests}
2025-12-10 12:26:51,422.422 INFO:teuthology.suite.run:Scheduling rados/dashboard/{0-single-container-host debug/mgr mon_election/classic random-objectstore$/{bluestore-comp-snappy} tasks/dashboard}
2025-12-10 12:26:51,422.422 INFO:teuthology.suite.run:Scheduling rados/encoder/{0-start 1-tasks supported-random-distro$/{centos_latest}}
2025-12-10 12:26:51,422.422 INFO:teuthology.suite.run:Scheduling rados/objectstore/{backends/fusestore supported-random-distro$/{centos_latest}}
2025-12-10 12:26:51,422.422 INFO:teuthology.suite.run:Scheduling rados/dashboard/{0-single-container-host debug/mgr mon_election/connectivity random-objectstore$/{bluestore-hybrid} tasks/e2e}
2025-12-10 12:26:51,422.422 INFO:teuthology.suite.run:Scheduling rados/singleton-nomsgr/{all/lazy_omap_stats_output mon_election/classic rados supported-random-distro$/{ubuntu_latest}}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/thrash/{0-size-min-size-overrides/3-size-2-min-size 1-pg-log-overrides/short_pg_log 2-recovery-overrides/{more-async-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-5} backoff/normal ceph clusters/{fixed-4} crc-failures/default d-balancer/crush-compat mon_election/connectivity msgr-failures/osd-dispatch-delay msgr/async-v1only objectstore/{bluestore/{alloc$/{bitmap} base mem$/{low} onode-segment$/{none} write$/{v1/{compr$/{yes$/{zstd}} v1}}}} rados supported-random-distro$/{centos_latest} thrashers/morepggrow thrashosds-health workloads/pool-snaps-few-objects}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/connectivity task/test_iscsi_container/{centos_9.stream test_iscsi_container}}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/cephadm/workunits/{0-distro/ubuntu_22.04 agent/on mon_election/connectivity task/test_monitoring_stack_basic}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/dashboard/{0-single-container-host debug/mgr mon_election/connectivity random-objectstore$/{bluestore-comp-zlib} tasks/dashboard}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/upgrade/parallel/{0-random-distro$/{centos_9.stream_runc} 0-start 1-tasks mon_election/connectivity overrides/ignorelist_health upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/cephadm/workunits/{0-distro/centos_9.stream agent/off mon_election/classic task/test_rgw_multisite}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/mgr/{clusters/{2-node-mgr} debug/mgr distro/{centos_latest} mgr_ttl_cache/disable mon_election/classic random-objectstore$/{bluestore/{alloc$/{stupid} base mem$/{normal-2} onode-segment$/{512K-onoff} write$/{v1/{compr$/{no$/{no}} v1}}}} tasks/{1-install 2-ceph 3-mgrmodules 4-units/prometheus}}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/thrash-erasure-code-isa/{arch/x86_64 ceph clusters/{fixed-4} ec_optimizations/ec_optimizations_on mon_election/connectivity msgr-failures/osd-dispatch-delay objectstore/{bluestore/{alloc$/{stupid} base mem$/{normal-1} onode-segment$/{none} write$/{v2/{compr$/{yes$/{lz4}} v2}}}} rados recovery-overrides/{more-partial-recovery} supported-random-distro$/{centos_latest} thrashers/pggrow_host thrashosds-health workloads/ec-rados-plugin=isa-k=10-m=4}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/cephadm/workunits/{0-distro/centos_9.stream_runc agent/on mon_election/connectivity task/test_set_mon_crush_locations}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/dashboard/{0-single-container-host debug/mgr mon_election/classic random-objectstore$/{bluestore-hybrid} tasks/e2e}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/singleton-nomsgr/{all/lazy_omap_stats_output mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/thrash-old-clients/{0-distro$/{centos_9.stream} 0-size-min-size-overrides/3-size-2-min-size 1-install/squid backoff/normal ceph clusters/{three-plus-one} d-balancer/on mon_election/classic msgr-failures/osd-delay rados thrashers/default thrashosds-health workloads/snaps-few-objects}
2025-12-10 12:26:51,423.423 INFO:teuthology.suite.run:Scheduling rados/cephadm/smoke/{0-distro/centos_9.stream 0-nvme-loop agent/on fixed-2 mon_election/connectivity start}
2025-12-10 12:26:51,424.424 INFO:teuthology.suite.run:Scheduling rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/connectivity task/test_cephadm}
2025-12-10 12:26:51,424.424 INFO:teuthology.suite.run:Suite rados in /cephfs/home/amathuri/src/git.ceph.com_ceph-c_81b05d7a71071a84cda7f3c891fc17481ced63f7/qa/suites/rados scheduled 21 jobs.
2025-12-10 12:26:51,424.424 INFO:teuthology.suite.run:251/272 jobs were filtered out.
2025-12-10 12:26:51,424.424 INFO:teuthology.suite.run:Scheduled 21 jobs in total.
2025-12-10 12:26:51,424.424 INFO:teuthology.suite.run:Test results viewable at https://pulpito.ceph.com/amathuri-2025-12-10_12:24:26-rados-wip-bharath10-testing-2025-12-08-1514-distro-default-gibba/

The main difference is that we are picking up the ceph branch and sha1 from the rerun configuration instead of defaulting to main :

2025-12-10 12:24:26,134.134 INFO:teuthology.suite:Using ceph branch=wip-bharath10-testing-2025-12-08-1514
2025-12-10 12:24:26,135.135 INFO:teuthology.suite:Using rerun ceph sha1=81b05d7a71071a84cda7f3c891fc17481ced63f7

Using --rerun without --ceph currently defaults to the main branch.
This change sources the branch from the original run being rerun, making reruns more intuitive.
Warns if --ceph is specified and does not match the original branch.

Fixes: https://tracker.ceph.com/issues/68872
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
@amathuria amathuria force-pushed the wip-amat-fix-rerun-branch branch from c08e30c to 830fbdd Compare December 10, 2025 12:41
The suite to schedule
--wait Block until the suite is finished
-c <ceph>, --ceph <ceph> The ceph branch to run against
[default: {default_ceph_branch}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remind me why this line has been deleted? I thought, there should be just added extra comment, that in case using --rerun the branch is taken from the referred run instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default being applied in docopt meant that we couldn't differentiate between "user passed --ceph main" from "user did not pass --ceph at all"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had TEUTH_CEPH_BRANCH environment variable which default the --ceph when it is not provided. Still unsure if we want to keep this functionality.

Copy link
Member

@zmc zmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! @kshtsk, what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants