feat: support new LeetCode discuss API (post-March 2025) #62
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
LeetCode has changed its data structure for compensation posts starting March 2025, which makes our current parsing logic obsolete. The new GraphQL query used by
https://leetcode.com/discuss/(ugcArticleDiscussionArticles) only returns a summary instead of the full content of posts.After investigating, there seems to be no available GraphQL query that returns the complete description for these new post IDs.
Approach
Introduced a new query (
COMP_POSTS_QUERY) for fetching posts created after March 2025.Retained the existing query as
COMP_POSTS_QUERY_LEGACYfor older posts.Implemented a date-based switch:
Since the new query only returns summaries, added a Selenium-based content extraction step:
Updated
refresh.pyto:Additional Notes
I have used this to extract ~1400 more posts on my local machine and run it thru LLM, all those data changes are in a diff PR to declutter.
Some random
pre-committests were failing from master, those changes are also included.This PR adds new dependencies (
selenium,webdriver-manager).While Selenium crawling is slower, it has proven reliable and ensures we get the full content.
More discussion is needed on:
_legacysuffix) and organize code structure better.Looking forward to feedback on: