Skip to content

update from Gluejar#94

Open
eshellman wants to merge 1030 commits intoEbookFoundation:masterfrom
Gluejar:master
Open

update from Gluejar#94
eshellman wants to merge 1030 commits intoEbookFoundation:masterfrom
Gluejar:master

Conversation

@eshellman
Copy link

No description provided.

eshellman and others added 30 commits January 27, 2025 11:22
implement turnstile on the search form, shorten session life
…Phase 1)

Remove pledge/purchase URL patterns, delete 7 dead view classes/functions
(PledgeView, PurchaseView, PledgeRechargeView, PledgeCancelView,
PledgeModifiedView, download_campaign, download_purchased), refactor
DownloadView to extend FormView directly, and simplify home/campaign list/
FundView/edition_uploads to remove REWARDS/B2U code paths.

Clean up templates (work_action, book_panel, home, manage_account) to
remove broken URL references to deleted routes.

Part of #1081 — preparing for Django migration by removing unused
campaign types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove references to removed pledge/b2u/ending facets from templates:
- explore.html: remove top_pledge section and "Buy to unglue" link
- about_lightbox_footer.html, libraries.html: ending → t4u
- faq.html, faq_b2u.html: pledge/b2u links → t4u

Harden CampaignListView: remove stale pledged/pledges/almost/soon
facets, raise Http404 for unrecognized facets instead of returning
Campaign.objects.all().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete CampaignUiTests, PledgingUiTests, and UnifiedCampaignTests —
these test REWARDS pledge flows via /pledge/ URLs that were removed
in Phase 1. Clean up now-unused imports.

core/tests.py B2U tests are left as-is since they test core model
methods that still exist (deferred to Phase 3).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- FAQ: Replace three-campaign-types intro with Thanks-for-Ungluing
  focus, add short legacy note for Pledge/B2U. Remove Pledge Campaign
  FAQ, Premiums FAQ, B2U-specific ebook Q&As, and detailed Rights
  Holder pledge/B2U sections. Simplify Funding and Conversion sections.
  Fix stale campaign_list links (pledge/b2u → t4u).
- learn_more.html: Remove Buy-to-Unglue and Pledge-to-Unglue program
  rows from landing page "how it works" panel (keep Thanks only).
- programs.html: Remove Pledge-to-Unglue and Buy-to-Unglue program
  terms (keep Thanks-for-Ungluing terms only).
- faq_b2u.html: Replace detailed B2U FAQ with short legacy note.

Part of #1081 (strip REWARDS/B2U campaign types).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revise FAQ and landing page for post-campaign era
- Delete sysadmin/ (superseded by regluit-provisioning/Ansible)
- Delete test/selenium-server-standalone-2.24.1.jar (obsolete binary)
- Gitignore: settings/prod.py, settings/local.py, settings/aws.py (Ansible-generated
  or local-only, should never be committed)
- Gitignore: credential-adjacent files (iam_keys, *.pem, *.der, etc.)
- Gitignore: IDE configs (.idea/, *.komodoproject)
- Gitignore: test drivers/binaries, test-data/, venv/, deploy/prod.wsgi

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolve conflicts in faq.html and faq_b2u.html by taking master's version
(PR #1085 FAQ rewrite supersedes #1084's changes to those files).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strip REWARDS + BUY2UNGLUE: Phase 1 — routes, views, templates
limit=None causes 'int >= NoneType' TypeError in load_doab_oai(),
crashing every unattended run. Guard the comparison so limit=None
means no cap (original intent).

Fixes #1096

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When DOAB returns 429 Too Many Requests, pyoai raises
urllib.error.HTTPError which previously propagated unhandled, crashing
the management command without printing the "loaded N records" summary.
The Retry-After header (typically 86400s = 24h) was silently discarded.

Now catches 429 specifically, logs the Retry-After value and how many
records were harvested before being cut off, then returns normally.
Any other HTTPError is still re-raised.

Fixes #1100

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Changed from 15s to 60s read timeout per Eric's feedback that 15s
is too aggressive for some publisher servers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three pre-existing bugs in the DOAB loader:

1. groups(1) -> group(1) in Springer FTP cover handling
   re.Match.groups() returns a tuple; .format(tuple) produces a broken
   URL like '.../('filename.jpg',)/...'. group(1) returns the string.

2. Non-FTP 302 redirect fetched url (original) instead of redirurl
   (the redirect destination). Now correctly follows the redirect.

3. item_type filter was case-sensitive and checked only the first list
   element via unlist(). Records where DOAB returns ['Book'] or
   ['peer-reviewed', 'book'] were silently skipped. Now checks the
   full list case-insensitively.

Fixes #1102

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_streamdata

Two reliability improvements:

1. Per-record exception isolation in load_doab_oai()
   add_by_doab() was called with no try/except. A DB constraint
   violation or unexpected metadata error at any record aborted the
   entire remaining harvest window silently. Now catches Exception,
   logs the DOAB ID and error, and continues to the next record.

2. Explicit 429 handling in get_streamdata()
   Previously a 429 response caused response.json() to raise
   ValueError (DOAB returns HTML on rate-limit), logged only as
   "decoder error" — masking the true cause. Now checks status code
   before calling .json() and logs a clear rate-limit error with the
   Retry-After value.

Note: surfacing partial 429 harvests in management command output
requires coordination with PR #1101 and is tracked in #1105.

Fixes #1105 (partial — management command visibility pending #1101)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ContentTyper: timeout=(5, 60) on all 4 requests calls
- load_ebookfile: timeout=(10, 60) on 3 requests calls (file downloads)
- get_soup: timeout=(10, 30) + Timeout exception handler

Read timeout changed from 15s to 60s per Eric's feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge as part of DOAB harvester fix rollout (Wave 1)
Add 60s timeout to DOAB cover image HTTP requests
Handle DOAB OAI HTTP 429 rate-limit gracefully
Fix Springer cover URL bug, redirect target bug, and item_type filter
Add timeouts to ContentTyper, load_ebookfile, and get_soup HTTP calls
DOAB harvest reliability: per-record exception isolation and 429 in get_streamdata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants