Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/sentinel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## 2025-02-27 - [Fix XML Parsing Vulnerabilities]
**Vulnerability:** Found `xml.etree.ElementTree` being used in `functions/scrapers/theverge.py` and `functions/scrapers/producthunt.py` to parse potentially untrusted XML feeds, exposing the system to XML External Entity (XXE) and Billion Laughs attacks.
**Learning:** External feeds are inherently untrusted and using the standard Python library XML parsers directly on untrusted input is a known security vulnerability.
**Prevention:** Always use `defusedxml.ElementTree` when parsing any XML or RSS feed that is fetched from the internet. Do not use the standard `xml.etree.ElementTree` directly.
1 change: 1 addition & 0 deletions functions/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ beautifulsoup4==4.*
feedparser==6.*
openai==1.*
tzdata
defusedxml==0.*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For a security-critical library like defusedxml, it's a best practice to pin the version more specifically than ==0.*. This prevents accidentally installing an older, potentially less secure, version from the 0.x series and improves build reproducibility. Using a compatible release specifier is recommended to lock it to a known good version while still allowing for non-breaking patch updates.

defusedxml~=0.7.1

3 changes: 2 additions & 1 deletion functions/scrapers/producthunt.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
import httpx
from typing import List, Dict, Any
from datetime import datetime
import xml.etree.ElementTree as ET
# Security: Use defusedxml to prevent XXE and Billion Laughs attacks when parsing untrusted RSS feeds
import defusedxml.ElementTree as ET
from bs4 import BeautifulSoup


Expand Down
3 changes: 2 additions & 1 deletion functions/scrapers/theverge.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
import httpx
from typing import List, Dict, Any
from datetime import datetime
import xml.etree.ElementTree as ET
# Security: Use defusedxml to prevent XXE and Billion Laughs attacks when parsing untrusted RSS feeds
import defusedxml.ElementTree as ET

try:
from ..resilience import retry_with_backoff
Expand Down