feat: support new LeetCode discuss API (post-March 2025) #62

anxkhn · 2025-09-12T05:58:53Z

LeetCode has changed its data structure for compensation posts starting March 2025, which makes our current parsing logic obsolete. The new GraphQL query used by https://leetcode.com/discuss/ (ugcArticleDiscussionArticles) only returns a summary instead of the full content of posts.

After investigating, there seems to be no available GraphQL query that returns the complete description for these new post IDs.

Approach

Introduced a new query (COMP_POSTS_QUERY) for fetching posts created after March 2025.
Retained the existing query as COMP_POSTS_QUERY_LEGACY for older posts.
Implemented a date-based switch:
- If the post date is after March 1, 2025 → use the new query.
- Otherwise → use the legacy query.
Since the new query only returns summaries, added a Selenium-based content extraction step:
- Uses a headless Chrome driver to visit the post page and grab the content via a stable CSS selector.
- Verified that LeetCode public pages currently have no rate limits, making this (the only) viable approach despite being slower.
Updated refresh.py to:
- Dynamically create and tear down the Chrome driver when needed.
- Fallback to using summaries if Selenium fails.

Additional Notes

I have used this to extract ~1400 more posts on my local machine and run it thru LLM, all those data changes are in a diff PR to declutter.
Some random pre-commit tests were failing from master, those changes are also included.
This PR adds new dependencies (selenium, webdriver-manager).
While Selenium crawling is slower, it has proven reliable and ensures we get the full content.
More discussion is needed on:
- Whether Selenium should be the long-term solution or if alternatives should be explored.
- How to name legacy-related code (e.g. _legacy suffix) and organize code structure better.

Looking forward to feedback on:

The overall approach (Selenium vs Playwright vs other methods) and how would it scale on GitHub Actions, etc.
Any edge cases you can come across and think should be handled.

anxkhn added 2 commits September 12, 2025 11:11

feat: update queries and enhance post fetching with Selenium support

71a78d2

refactor: passing all pre-commit checks

461d26e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support new LeetCode discuss API (post-March 2025) #62

feat: support new LeetCode discuss API (post-March 2025) #62

Uh oh!

anxkhn commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support new LeetCode discuss API (post-March 2025) #62

Are you sure you want to change the base?

feat: support new LeetCode discuss API (post-March 2025) #62

Uh oh!

Conversation

anxkhn commented Sep 12, 2025

Approach

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant