A sophisticated GitHub Action that validates web links in HTML files with AI-powered suggestions for improvements. Goes beyond traditional link checkers by providing intelligent recommendations and handling modern web challenges.
- π€ AI-Powered Suggestions: Intelligent recommendations for broken or redirected links
- π Smart Detection: Bot-blocking awareness and enhanced robustness
- β‘ Performance Optimized: Respectful rate limiting and efficient scanning
- ποΈ Flexible Modes: Full project or PR-changed files scanning
- π§ Highly Configurable: Custom timeouts, status codes, and behaviors
- π Rich Reporting: GitHub issues, artifacts, and detailed JSON output
- π Documentation Ready: Perfect for Jupyter Book and documentation sites
- Smart Link Validation: Checks external web links in HTML files with configurable timeout and redirect handling
- Enhanced Robustness: Intelligent detection of bot-blocked sites to reduce false positives
- AI-Powered Suggestions: Provides intelligent recommendations for broken or redirected links
- Two Scanning Modes: Full project scan or PR-specific changed files only
- Configurable Status Codes: Define which HTTP status codes to silently report (e.g., 403, 503)
- Redirect Detection: Identifies and suggests updates for redirected links
- GitHub Integration: Creates issues, PR comments, and workflow artifacts
- MyST Markdown Support: Works with Jupyter Book projects by scanning HTML output
- Performance Optimized: Respectful rate limiting, improved timeouts, and efficient scanning
- name: Check links in documentation
uses: QuantEcon/action-link-checker@v1name: Weekly Link Check
on:
schedule:
- cron: '0 9 * * 1' # Monday at 9 AM UTC
workflow_dispatch:
jobs:
link-check:
runs-on: ubuntu-latest
permissions:
contents: read
issues: write
steps:
- uses: actions/checkout@v4
with:
ref: gh-pages # Check the published site
- name: AI-powered link check
uses: QuantEcon/action-link-checker@v1
with:
html-path: '.'
mode: 'full'
fail-on-broken: 'false'
create-issue: 'true'
ai-suggestions: 'true'
silent-codes: '403,503'
issue-title: 'Weekly Link Check Report'
notify: 'maintainer1,maintainer2'name: PR Link Check
on:
pull_request:
branches: [ main ]
jobs:
link-check:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Build documentation
run: jupyter-book build .
- name: Check links in changed files
uses: QuantEcon/action-link-checker@v1
with:
html-path: './_build/html'
mode: 'changed'
fail-on-broken: 'true'
ai-suggestions: 'true'
silent-codes: '403,503'- name: Comprehensive link checking
uses: QuantEcon/action-link-checker@v1
with:
html-path: './_build/html'
mode: 'full'
silent-codes: '403,503,429'
fail-on-broken: 'false'
ai-suggestions: 'true'
create-issue: 'true'
issue-title: 'Link Check Report - Broken Links Found'
create-artifact: 'true'
artifact-name: 'detailed-link-report'
notify: 'team-lead,docs-maintainer'
timeout: '30'
max-redirects: '5'The action includes intelligent logic to reduce false positives for legitimate sites:
- Major Sites: Automatically detects common sites that block automated requests (Netflix, Amazon, Facebook, etc.)
- Encoding Issues: Identifies encoding errors that often indicate bot protection
- Status Code Analysis: Recognizes rate limiting (429) and bot blocking patterns
- Silent Reporting: Marks likely bot-blocked sites as silent instead of broken
- Browser-like Headers: Uses realistic browser headers to reduce blocking
- Increased Timeout: Default 45-second timeout for slow-loading legitimate sites
- Smart Error Handling: Distinguishes between genuine broken links and temporary blocks
- Constructive Suggestions: Only suggests fixes, not removals, for legitimate domains
- Manual Review: Suggests manual verification for unknown domains instead of automatic removal
- Domain Whitelist: Recognizes trusted domains (GitHub, Python.org, etc.) and handles them appropriately
The action includes intelligent analysis that can suggest:
- HTTPS Upgrades: Detects
http://links that should behttps:// - GitHub Branch Updates: Finds
/master/links that should be/main/ - Documentation Migrations: Suggests updated URLs for moved documentation sites
- Version Updates: Recommends newer versions of deprecated documentation
- Final Destination: Suggests updating redirected links to their final destination
- Performance: Eliminates unnecessary redirect chains
- Reliability: Reduces dependency on redirect services
π€ http://docs.python.org/2.7/library/urllib.html
Issue: Broken link (Status: 404)
π‘ version_update: https://docs.python.org/3/library/urllib.html
Reason: Python 2.7 is deprecated, consider Python 3 documentation
π€ http://github.com/user/repo/blob/master/README.md
Issue: Redirected 1 times
π‘ redirect_update: https://github.com/user/repo/blob/main/README.md
Reason: GitHub default branch changed from master to main
- File Discovery: Scans HTML files in the specified directory
- Link Extraction: Uses BeautifulSoup to extract all external links
- Link Validation: Checks each link with configurable timeout and redirect handling
- AI Analysis: Applies rule-based AI to suggest improvements
- Reporting: Creates detailed reports with actionable suggestions
- Scans all HTML files in the target directory
- Ideal for scheduled weekly scans
- Comprehensive coverage of entire project
- Only scans HTML files that changed in the current PR
- Efficient for PR-triggered workflows
- Falls back to full scan if no changes detected
Configure which HTTP status codes should be reported without failing:
silent-codes: '403,503,429,502'Common codes to consider:
403: Forbidden (often due to bot detection)503: Service Unavailable (temporary outages)429: Too Many Requests (rate limiting)502: Bad Gateway (temporary server issues)
timeout: '30' # Timeout per link in seconds
max-redirects: '5' # Maximum redirects to followBefore (using lychee):
- name: Link Checker
uses: lycheeverse/lychee-action@v2
with:
fail: false
args: --accept 403,503 *.htmlAfter (using AI-powered link checker):
- name: AI-Powered Link Checker
uses: QuantEcon/action-link-checker@v1
with:
html-path: '.'
fail-on-broken: 'false'
silent-codes: '403,503'
ai-suggestions: 'true'
create-issue: 'true'For Jupyter Book projects:
- name: Build Jupyter Book
run: jupyter-book build lectures/
- name: Check links in built documentation
uses: QuantEcon/action-link-checker@v1
with:
html-path: './lectures/_build/html'
mode: 'full'
ai-suggestions: 'true'Use action outputs in subsequent workflow steps:
- name: Check links
id: link-check
uses: QuantEcon/action-link-checker@v1
with:
fail-on-broken: 'false'
- name: Report results
run: |
echo "Broken links: ${{ steps.link-check.outputs.broken-link-count }}"
echo "Redirects: ${{ steps.link-check.outputs.redirect-count }}"
echo "AI suggestions available: ${{ steps.link-check.outputs.ai-suggestions != '' }}"Required workflow permissions depend on features used:
permissions:
contents: read # Always required
issues: write # For create-issue: 'true'
pull-requests: write # For PR comments
actions: read # For create-artifact: 'true'| Input | Description | Required | Default |
|---|---|---|---|
html-path |
Path to HTML files directory | No | ./_build/html |
mode |
Scan mode: full or changed |
No | full |
silent-codes |
HTTP codes to silently report | No | 403,503 |
fail-on-broken |
Fail workflow on broken links | No | true |
ai-suggestions |
Enable AI-powered suggestions | No | true |
create-issue |
Create GitHub issue for broken links | No | false |
issue-title |
Title for created issues | No | Broken Links Found in Documentation |
create-artifact |
Create workflow artifact | No | false |
artifact-name |
Name for workflow artifact | No | link-check-report |
notify |
Users to assign to created issue | No | `` |
timeout |
Timeout per link (seconds) | No | 45 |
max-redirects |
Maximum redirects to follow | No | 5 |
| Output | Description |
|---|---|
broken-links-found |
Whether broken links were found |
broken-link-count |
Number of broken links |
redirect-count |
Number of redirects found |
link-details |
Detailed broken link information |
ai-suggestions |
AI-powered improvement suggestions |
issue-url |
URL of created GitHub issue |
artifact-path |
Path to created artifact file |
- Weekly Scans: Use scheduled workflows for comprehensive link checking
- PR Validation: Use changed-file mode for efficient PR validation
- Status Code Configuration: Adjust silent codes based on your links' typical behavior
- AI Suggestions: Review and apply AI suggestions to improve link quality
- Issue Management: Use automatic issue creation for tracking broken links
- Performance: Set appropriate timeouts based on your link destinations
- Timeout Errors: Increase
timeoutvalue for slow-responding sites (default is now 45s) - False Positives: The action automatically detects major sites that block bots (Netflix, Amazon, etc.)
- Rate Limiting: Add
429tosilent-codesfor rate-limited sites - Bot Blocking: Legitimate sites blocking automated requests are automatically handled gracefully
- Large Repositories: Use
changedmode for PR workflows
If legitimate links are being flagged as broken:
- Check if it's a major site: Netflix, Amazon, Facebook, etc. are automatically detected as likely bot-blocked
- Increase timeout: Use
timeout: '60'for slower sites like tutorials or educational content - Add to silent codes: If a site consistently returns specific error codes, add them to
silent-codes - Review AI suggestions: The action provides constructive fix suggestions rather than suggesting removal
The action provides detailed logging including:
- Number of files scanned
- Links found per file
- Status codes and errors
- AI suggestion reasoning
This action can directly replace lychee workflows with enhanced functionality:
- Replace
lycheeverse/lychee-actionwith this action - Update input parameters (see comparison above)
- Add AI suggestions and issue creation features
- Configure silent status codes as needed
The enhanced AI capabilities provide value beyond basic link checking by suggesting improvements and maintaining link quality over time.