Skip to content

Conversation

@jaymzh
Copy link
Collaborator

@jaymzh jaymzh commented Nov 13, 2025

This adds a new hook point (and sample plugin usage) that allows the
Chef run to be skipped based on some local criteria.

Example usage might be:

  • Device is on battery
  • Device is not connected to VPN/backhaul/etc.
  • Some global service meant to disable runs during an emergency

Previously I did this in pre_run or pre_start, but the problem with that
is that the only way is to force exit, which causes the logs to get
messed up because we never update the links. This provides a clean way
to skip the run but still update the chef.{cur,last} links so that it's
clear what has happened.

Sample output:

$ sudo chefctl -iv
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Loading plugin at /etc/cinc/chefctl_hooks.rb.
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Including registered plugin KrHook
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Trying lock /var/lock/subsys/chefctl
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Lock acquired: /var/lock/subsys/chefctl
[2025-11-13 18:27:44 +1000] INFO chefctl: taste-tester mode ends in < 1 hour, extending back to 1 hour
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Skippinbg battery check due to --immediate flag

and

$ sudo chefctl
[2025-11-13 18:27:22 +1000] INFO chefctl: taste-tester mode ends in < 1 hour, extending back to 1 hour
[2025-11-13 18:27:22 +1000] WARN chefctl: Running on battery power, skipping Chef run
[2025-11-13 18:27:22 +1000] INFO chefctl: Plugin requested skipping chef run.

Signed-off-by: Phil Dibowitz phil@ipom.com

Copy link
Contributor

@dafyddcrosby dafyddcrosby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if retval should still be 0 if the run is skipped. It's not a run failure, but the run didn't happen, either, which could be construed as a policy failure. As we have CHEFCLIENT_FAILURE, we may want to have a constant for CHEFCLIENT_SKIPPED so that, if the host gets in a state where it's regularly skipping, it can be easily identified by monitoring. I'll suggest CHEFCLIENT_SKIPPED be an exit code of 67 to show I'm not totally a fossil 😜

@jaymzh
Copy link
Collaborator Author

jaymzh commented Nov 14, 2025

Well that has some risks because you don't necessarily want cron to think it failed.

I have two possible compromises that I'm cool with, but I much prefer the first.

option 1

The hook returns false (do the run), true (skip the run, return success), or - and if it's an int, we use that as the exit value. This allows the hook to decide how this should get handled.

option 2

We add a config option --error-on-skipped-run or some such, which forces it to exit with a pre-determined exit code

I think the first one is more flexible for all sorts of use-cases (you could have multiple return codes for WHY it failed, including some of them being 0)... but if there's a reason I'm missing to choose the second, I'm open to hearing it.

@jaymzh
Copy link
Collaborator Author

jaymzh commented Nov 25, 2025

@dafyddcrosby - ping? which would you prefer

@dafyddcrosby
Copy link
Contributor

I think the first option should be fine. I think that as long as we're not in a position where no Chef run has actually happened for $period_of_time and we have no way of determining it was skipped, should be good.

@jaymzh
Copy link
Collaborator Author

jaymzh commented Dec 1, 2025

Awesome, thanks, I'll modify accordingly.

@jaymzh jaymzh force-pushed the new-skip-hook branch 2 times, most recently from 8893990 to cdbf21c Compare December 1, 2025 22:14
This adds a new hook point (and sample plugin usage) that allows the
Chef run to be skipped based on some local criteria.

Example usage might be:
- Device is on battery
- Device is not connected to VPN/backhaul/etc.
- Some global service meant to disable runs during an emergency

Previously I did this in pre_run or pre_start, but the problem with that
is that the only way is to force `exit`, which causes the logs to get
messed up because we never update the links. This provides a clean way
to skip the run but still update the chef.{cur,last} links so that it's
clear what has happened.

Sample output:

```
$ sudo chefctl -iv
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Loading plugin at /etc/cinc/chefctl_hooks.rb.
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Including registered plugin KrHook
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Trying lock /var/lock/subsys/chefctl
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Lock acquired: /var/lock/subsys/chefctl
[2025-11-13 18:27:44 +1000] INFO chefctl: taste-tester mode ends in < 1 hour, extending back to 1 hour
[2025-11-13 18:27:44 +1000] DEBUG chefctl: Skippinbg battery check due to --immediate flag
```

and

```
$ sudo chefctl
[2025-11-13 18:27:22 +1000] INFO chefctl: taste-tester mode ends in < 1 hour, extending back to 1 hour
[2025-11-13 18:27:22 +1000] WARN chefctl: Running on battery power, skipping Chef run
[2025-11-13 18:27:22 +1000] INFO chefctl: Plugin requested skipping chef run.
```

Signed-off-by: Phil Dibowitz <phil@ipom.com>
@jaymzh
Copy link
Collaborator Author

jaymzh commented Dec 1, 2025

OK, option 1 implemented. Plus a few typos in comments fixed.

@meta-codesync
Copy link

meta-codesync bot commented Dec 31, 2025

@dafyddcrosby has imported this pull request. If you are a Meta employee, you can view this in D89968754.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants