Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/sentinel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## 2025-03-05 - [XML External Entity (XXE) Prevention]
**Vulnerability:** Found `xml.etree.ElementTree` being used to parse external untrusted RSS feeds in scrapers (`theverge.py` and `producthunt.py`). This standard library is vulnerable to XML vulnerabilities such as XXE and Billion Laughs.
**Learning:** External feeds must always be treated as untrusted data. Standard XML parsers often do not protect against recursive entities or external entity resolution.
**Prevention:** Always use `defusedxml.ElementTree` instead of the standard library `xml.etree.ElementTree` when parsing untrusted XML/RSS feeds to prevent XML-based attacks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make this prevention advice more actionable for future developers, consider adding a hyperlink to the defusedxml library's documentation or PyPI page. This will make it easier for others to quickly find and learn about the recommended library.

Suggested change
**Prevention:** Always use `defusedxml.ElementTree` instead of the standard library `xml.etree.ElementTree` when parsing untrusted XML/RSS feeds to prevent XML-based attacks.
**Prevention:** Always use [`defusedxml.ElementTree`](https://pypi.org/project/defusedxml/) instead of the standard library `xml.etree.ElementTree` when parsing untrusted XML/RSS feeds to prevent XML-based attacks.

1 change: 1 addition & 0 deletions functions/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ beautifulsoup4==4.*
feedparser==6.*
openai==1.*
tzdata
defusedxml==0.7.*
2 changes: 1 addition & 1 deletion functions/scrapers/producthunt.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import httpx
from typing import List, Dict, Any
from datetime import datetime
import xml.etree.ElementTree as ET
import defusedxml.ElementTree as ET
from bs4 import BeautifulSoup


Expand Down
2 changes: 1 addition & 1 deletion functions/scrapers/theverge.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import httpx
from typing import List, Dict, Any
from datetime import datetime
import xml.etree.ElementTree as ET
import defusedxml.ElementTree as ET

try:
from ..resilience import retry_with_backoff
Expand Down