Skip to content

Add Apache bot-block.conf for per-IP blocking#23

Open
rdhyee wants to merge 4 commits intomasterfrom
apache-bot-block
Open

Add Apache bot-block.conf for per-IP blocking#23
rdhyee wants to merge 4 commits intomasterfrom
apache-bot-block

Conversation

@rdhyee
Copy link
Contributor

@rdhyee rdhyee commented Feb 27, 2026

Summary

Adds an Ansible-managed Apache conf that blocks specific IPs at the web-server level — before requests reach mod_wsgi/Django — with near-zero CPU cost. IPs are configured per-environment via group_vars.

Companion to Gluejar/regluit#1094 (BotBlockingMiddleware for UA-based blocking). The two layers work together:

Layer What it blocks Where
BotBlockingMiddleware (regluit#1094) Known bot UAs (ClaudeBot, GPTBot, …) Django
bot-block.conf (this PR) Egregious single-IP offenders Apache, pre-WSGI

Changes

roles/regluit_prod/templates/bot-block.conf.j2 (new)

  • <RequireAll> / Require not ip block for each entry in blocked_ips
  • Renders empty (no-op) if blocked_ips is undefined or empty — safe for test/ondeck

roles/regluit_prod/tasks/apache.yml

  • Deploys template to /etc/apache2/conf-available/bot-block.conf
  • Runs a2enconf bot-block and triggers Apache restart

group_vars/production/vars.yml

  • Adds blocked_ips list with 216.73.216.178 (ClaudeBot single IP, 229K req on 2026-02-26)

Adding / removing IPs

Edit blocked_ips in group_vars/production/vars.yml and re-run the playbook. No server login required.

Test plan

  • ansible-playbook -i hosts setup-test.yml succeeds
  • curl -I https://test.unglue.it/ returns 403 from a blocked IP, 200 from a clean IP
  • With empty blocked_ips, conf file renders as empty and Apache starts cleanly

rdhyee and others added 3 commits February 26, 2026 16:08
Deploys a bot-block.conf to all servers from a blocked_ips variable
defined per environment in group_vars. Apache rejects listed IPs before
the request reaches mod_wsgi/Django — near-zero CPU cost.

UA-based bot blocking (ClaudeBot, GPTBot, etc.) lives in Django's
BotBlockingMiddleware (Gluejar/regluit PR #1094). This conf is the
supplemental layer for egregious single-IP offenders.

Changes:
- templates/bot-block.conf.j2: new Apache conf, renders empty if
  blocked_ips is undefined or empty (safe for test/ondeck with no list)
- tasks/apache.yml: deploy + a2enconf the new conf
- group_vars/production/vars.yml: adds blocked_ips with 216.73.216.178
  (ClaudeBot single IP, 229K req on 2026-02-26)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two critical gaps addressed:

1. UA blocking moved to Apache (pre-WSGI). SetEnvIfNoCase rules mirror
   Django's BAD_ROBOTS list. Apache rejects matched requests before
   mod_wsgi spawns a thread — protects all 30 WSGI slots instead of
   occupying one to return a 403 from Django.

2. Tencent Cloud 43.173.0.0/16 added to blocked_cidrs. On 2026-02-26
   this single /16 sent 14,797 req across 1,495 unique IPs — too
   distributed for per-IP blocking, needs CIDR treatment.

Also splits blocked_ips (single-host offenders) from blocked_cidrs
(network ranges) for clearer intent in vars files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add apache-log-gzip cron: gzip logs older than 1 day at 03:00
- Update apache-log-cleanup: delete .log.gz older than 30 days (was
  deleting .log older than 14 days, no compression)
- Access logs compress ~86%; 4.9G → 943M on first manual run today
- Update restart-workaround comment: bot mitigation now live on prod

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rdhyee rdhyee added layer:apache Apache/mod_wsgi configuration status:ready-to-implement Waiting on review/merge type:bot-mitigation Bot traffic blocking and mitigation labels Feb 27, 2026
Four new UAs observed flooding prod on 2026-02-28 (03:00 UTC):
- meta-webindexer/1.1: 206 req — Meta bot, different UA than meta-externalagent
  (same 57.141.x.x network, just switched UA string)
- DataForSeoBot/1.0: 25 req — explicit SEO crawler
- QIHU 360SE: 32 req — Chinese bot/scraper
- MetaSr 1.0: 26 req — old Chinese browser UA used by scrapers

All four were passing through the Apache block and reaching WSGI slots.
Applied live to prod (manual bot-block.conf) before this commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rdhyee
Copy link
Contributor Author

rdhyee commented Feb 28, 2026

2026-02-28 — Live patch: 4 new bot UAs added

Prod was slow at ~03:04 UTC (load avg 4.59, five Apache workers pegging CPU at 40–98%). Access log analysis identified four UAs not in the existing block list:

UA Requests (last 2000 log lines) Notes
meta-webindexer/1.1 206 Meta bot — switched UA from meta-externalagent (which was blocked); same 57.141.x.x network
QIHU 360SE 32 Chinese bot/scraper
MetaSr 1.0 26 Old Chinese browser UA used by scrapers
DataForSeoBot/1.0 25 SEO crawler

Applied live to /etc/apache2/conf-available/bot-block.conf and reloaded Apache (apache2ctl configtest && service apache2 reload). Commit cb966eb adds these to the Ansible template so the next Ansible run will bake them in permanently.

Note: amazonbot was already in the live conf and correctly returning 403s — those 128 requests in the log were being rejected at Apache level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

layer:apache Apache/mod_wsgi configuration status:ready-to-implement Waiting on review/merge type:bot-mitigation Bot traffic blocking and mitigation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant