Simple bash script to automatically restart services if they fail too often
- create the following files
touch ar.conf # optional
touch test-command
touch restart-command
touch notify-command- when script is started, content of
test-commandfile is executed. If it returns non-zero exit code, then under certain conditions, content ofrestart-commandfile will be executed. - restart only happens if
test-commandfailed at leastAR_FAILS_BEFORE_RESTARTtimes- failures happen sufficiently often (of the latest
AR_FAILS_BEFORE_RESTARTfailures, the earliest one and the last one are separated by less thanAR_FAILS_MAX_SEPseconds) - latest restart didn't happen too recently (at least
AR_RESTARTS_MIN_SEPseconds should pass between restarts) AR_RESTART=1(it is by default)
- when all above conditions are met and a restart is triggered,
notify-commandis also executed (unlessAR_NOTIFY=0) - timestamps of failures and restarts are written to
AR_FAILS_LOGandAR_RESTARTS_LOG, these files should be preserved for correct work of the script - environment variables can be read from
ar.conffile if it exists. Full list of env variables and their default values is:
export AR_DEBUG=0 # additional logging (0/1)
export AR_SERVICE=$(hostname) # name of the service
export AR_USE_LOCK=1 # use lock file (0/1)
export AR_LOG=logs/ar.log # main log file
export AR_FAILS_LOG=ar-fails # log file to store failures
export AR_RESTARTS_LOG=ar-restarts # log file to store restarts
export AR_EXEC_LOG=logs/ar-exec-out.log # stdout for *-command. Use /dev/null to ignore output
export AR_EXEC_ERR=logs/ar-exec-err.log # stderr for *-command. Use /dev/null to ignore output
export AR_FAILS_BEFORE_RESTART=3 # min N of failures to trigger restart
export AR_FAILS_MAX_SEP=300 # max separation of failures in seconds so that they count as consecutive
export AR_RESTARTS_MIN_SEP=600 # min number of seconds between two restarts
export AR_TEST_COMMAND_FILE=test-command # name of the file with test command
export AR_RESTART_COMMAND_FILE=restart-command # name of the file with restart command
export AR_NOTIFY_COMMAND_FILE=notify-command # name of the file with notification command
export AR_RESTART=1 # switch to quickly enable/disable restarts
export AR_NOTIFY=1 # switch to quickly enable/disable notifications- if
AR_USE_LOCK=1script will create.ar.lockfile on startup that will prevent multiple instances of the script to be runnnig at the same time - exit code of this script is
0if test passed1if test failed, but restart wasn't triggered2if restart was or should have been triggered (depending on value ofAR_RESTART)3if initial validation failed and one of*-commandfiles doesn't exist or is empty4if lock file exists so script didn't run
- files in
logs/subfolder can be rotated freely. Filesar-failsandar-restarts, or at least their tailing lines, should be preserved.