Skip to content

Conversation

@joise-s-arakkal
Copy link

@joise-s-arakkal joise-s-arakkal commented Dec 16, 2025

YouTube Comment Classifier using Zero-Shot Learning
Analyze and classify top comments from any YouTube video into custom intent-style labels using 🤗 Hugging Face's facebook/bart-large-mnli. Built with a clean Gradio UI and deployed on both Google Colab and Hugging Face Spaces.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added YouTube Comment Classifier project with zero-shot learning capabilities for classifying comments into custom labels via Gradio interface.
  • Documentation

    • Added comprehensive README with setup and deployment guidance for Colab and Hugging Face Spaces.
  • Chores

    • Added MIT License and required project dependencies.

✏️ Tip: You can customize this high-level summary in your review settings.

Added YouTube Comment Classifier project to the README.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

This pull request introduces a new YouTube Comment Classifier project using zero-shot learning with facebook/bart-large-mnli. It adds a Jupyter notebook implementing classification via Gradio UI with YouTube API integration, accompanied by project documentation, licensing, and required dependencies.

Changes

Cohort / File(s) Summary
Root README update
README.md
Added "YouTube Comment Classifier" entry under Beginner Projects > Chat Interfaces & UI section
YouTube Classifier Project Setup
youtube-zero-shot-classifier/LICENSE, youtube-zero-shot-classifier/README.md, youtube-zero-shot-classifier/requirements.txt
Introduced MIT license, project documentation describing zero-shot classification approach, UI framework (Gradio), deployment options, and Python dependencies (transformers, gradio, google-api-python-client, torch)
YouTube Classifier Implementation
youtube-zero-shot-classifier/YouTube_Comment_Classifier_with_Zero_Shot_Learning.ipynb
New Jupyter notebook implementing end-to-end workflow: API key handling, comment fetching from YouTube, zero-shot classification, and live Gradio UI with real-time result table display

Sequence Diagram

sequenceDiagram
    participant User
    participant Gradio as Gradio UI
    participant Notebook as Notebook Logic
    participant YouTube as YouTube API
    participant Model as Zero-Shot Classifier
    participant Display as Results Table

    User->>Gradio: Enter YouTube URL
    Gradio->>Notebook: Trigger Classification
    Notebook->>Notebook: Extract Video ID from URL
    Notebook->>YouTube: Fetch Top-Level Comments
    YouTube-->>Notebook: Return Comments List
    Notebook->>Notebook: Initialize Classifier<br/>(facebook/bart-large-mnli)
    
    loop For Each Comment
        Notebook->>Model: Classify Against Candidate Labels
        Model-->>Notebook: Return Label & Confidence
        Notebook->>Display: Append Result<br/>(comment, label, confidence)
        Notebook->>Gradio: Update Results Dataframe
        Gradio-->>User: Display Live Results
    end
    
    Note over Notebook: Up to 20 comments<br/>classified incrementally
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus areas: The notebook implementation should be reviewed for proper error handling (invalid URLs, API failures) and API key security practices in Colab environment
  • Dependency compatibility: Verify that listed dependencies (transformers, gradio, google-api-python-client, torch) are compatible versions for the notebook environment
  • API rate limiting: Confirm YouTube API usage respects quota and rate limits for production-like scenarios

Possibly related PRs

  • Chore/readme #201: Updates README project listings with overlapping modifications to the Beginner Projects section and Chat Interfaces & UI category documentation

Poem

🐰 A classifier hops through comments bright,
Zero-shot learning—pure delight!
YouTube's chatter, sorted with care,
Through Gradio's UI, results laid bare.
Whiskers twitch at labels found,
AI magic—safe and sound! 🎯

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Added a new project' is vague and generic. While technically accurate, it fails to convey specific information about what project was added or its purpose. Use a more descriptive title that identifies the specific project, such as 'Add YouTube Comment Classifier with zero-shot learning' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
youtube-zero-shot-classifier/YouTube_Comment_Classifier_with_Zero_Shot_Learning.ipynb (4)

2076-2076: Consider adding torch to the installation command.

The requirements.txt file includes torch, but it's missing from this pip install command. While torch gets installed as a transitive dependency of transformers, explicitly listing it improves clarity and matches the requirements file.

Apply this diff:

-!pip install gradio transformers google-api-python-client --quiet
+!pip install gradio transformers google-api-python-client torch --quiet

2094-2095: Add validation for missing API key.

If YOUTUBE_API_KEY is not set in Colab userdata, the code will fail later with a cryptic error. Adding early validation improves the user experience.

Apply this diff:

 # Load API key securely from Colab
 API_KEY = userdata.get('YOUTUBE_API_KEY')
+if not API_KEY:
+    raise ValueError(
+        "❌ YOUTUBE_API_KEY not found in Colab secrets. "
+        "Please add it via the key icon in the left sidebar."
+    )

2112-2133: Consider documenting YouTube API quota limits.

The YouTube Data API has daily quota limits (typically 10,000 units/day for free tier, with comment fetches costing 1 unit per request). For production use, consider adding rate limiting or error handling for quota exceeded errors.

This is not a blocking issue for a beginner demo project, but users should be aware of potential quota limitations if they process many videos.


2148-2148: Set multi_label=False for better performance.

Since you only use the top label (result["labels"][0]), setting multi_label=False will be more efficient. The multi_label=True mode computes probabilities for all label combinations, which is unnecessary here.

Apply this diff:

-            result = classifier(comment, candidate_labels=LABELS, multi_label=True)
+            result = classifier(comment, candidate_labels=LABELS, multi_label=False)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ce9513 and b6aeb67.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • youtube-zero-shot-classifier/LICENSE (1 hunks)
  • youtube-zero-shot-classifier/README.md (1 hunks)
  • youtube-zero-shot-classifier/YouTube_Comment_Classifier_with_Zero_Shot_Learning.ipynb (1 hunks)
  • youtube-zero-shot-classifier/requirements.txt (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
youtube-zero-shot-classifier/YouTube_Comment_Classifier_with_Zero_Shot_Learning.ipynb

78-78: Do not catch blind exception: Exception

(BLE001)


79-79: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (4)
README.md (1)

81-81: LGTM!

The project entry is well-formatted and appropriately placed in the Beginner Projects section. The description clearly communicates the project's purpose.

youtube-zero-shot-classifier/LICENSE (1)

1-21: LGTM!

Standard MIT License properly formatted with appropriate copyright attribution.

youtube-zero-shot-classifier/README.md (1)

1-40: LGTM!

The documentation is well-structured and comprehensive. The inclusion of deployment badges and clear feature descriptions makes it easy for users to understand and try the project.

youtube-zero-shot-classifier/YouTube_Comment_Classifier_with_Zero_Shot_Learning.ipynb (1)

2161-2178: LGTM!

The Gradio UI is well-structured with clear labels, appropriate components, and good UX design. The dynamic dataframe with live updates provides a nice user experience.

Comment on lines +1 to +4
transformers
gradio
google-api-python-client
torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pin dependency versions for reproducibility.

Without version constraints, the project may break when dependencies release updates with breaking changes. This is particularly important for transformers and torch, which frequently introduce API changes.

Apply this diff to pin to current stable versions:

-transformers
-gradio
-google-api-python-client
-torch
+transformers>=4.30.0,<5.0.0
+gradio>=4.0.0,<5.0.0
+google-api-python-client>=2.100.0,<3.0.0
+torch>=2.0.0,<3.0.0

Alternatively, generate a requirements.txt with exact versions using pip freeze after testing:

#!/bin/bash
# Generate pinned requirements after successful testing
pip freeze | grep -E "^(transformers|gradio|google-api-python-client|torch)==" > requirements.txt
🤖 Prompt for AI Agents
In youtube-zero-shot-classifier/requirements.txt around lines 1 to 4, the
dependencies are unpinned which risks breakage from upstream changes; update the
file to pin each package to a specific, tested version (e.g., replace each
package name with package==<version>) and either (a) choose current stable
versions you verified work in CI/dev, or (b) run pip freeze in a validated
environment and write the exact versions for transformers, gradio,
google-api-python-client, and torch into requirements.txt so installs are
reproducible.

Comment on lines +2157 to +2158
" except Exception as e:\n",
" yield f\"❌ Error: {str(e)}\", []\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Catch specific exceptions instead of bare Exception.

Catching all exceptions masks errors and makes debugging harder. The YouTube API can raise specific exceptions like HttpError for quota limits or invalid video IDs, and the classifier can raise RuntimeError for model issues.

Apply this diff:

+from googleapiclient.errors import HttpError
+
 # ... (in classify_comments_live function)
 
-    except Exception as e:
-        yield f"❌ Error: {str(e)}", []
+    except HttpError as e:
+        if e.resp.status == 403:
+            yield "❌ Error: YouTube API quota exceeded. Please try again later.", []
+        elif e.resp.status == 404:
+            yield "❌ Error: Video not found or comments are disabled.", []
+        else:
+            yield f"❌ YouTube API Error: {str(e)}", []
+    except RuntimeError as e:
+        yield f"❌ Model Error: {str(e)}", []
+    except Exception as e:
+        yield f"❌ Unexpected Error: {str(e)}", []

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
youtube-zero-shot-classifier/YouTube_Comment_Classifier_with_Zero_Shot_Learning.ipynb
around lines 2157-2158, replace the bare "except Exception as e" with handlers
for the specific errors the code can raise: import and catch
googleapiclient.errors.HttpError to handle YouTube API problems (quota, invalid
IDs) and catch RuntimeError (or the specific model exceptions your classifier
raises) for model-related failures, yielding a clear error message and details
for each; keep a final broad except only as a last-resort fallback (log full
details and re-raise or return a safe failure) so you no longer silently swallow
all exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant