Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 21 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,15 @@ Run open‑source content moderation models (NSFW, nudity, etc.) with one line

## 🚀 Performance

NSFW image detection performance of `nsfw-detector-mini` compared with [Azure Content Safety AI](https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety) and [Falconsai](https://huggingface.co/Falconsai/nsfw_image_detection).
NSFW image detection performance on the LSPD test set. Models with `nsfw-detection-2` prefix support 5-class classification (safe, porn, hentai, drawing, sexy). **F_macro** is the macro-averaged F1 score across all classes.

**F_safe** and **F_nsfw** below are class-wise F1 scores for safe and nsfw classes, respectively. Results show that `nsfw-detector-mini` performs better than Falconsai and Azure AI with fewer parameters.

| Model | F_safe | F_nsfw | Params |
| ------------------------------------------------------------------------------------ | ---------: | ---------: | ------: |
| [nsfw-detector-nano](https://huggingface.co/viddexa/nsfw-detection-nano) | 96.91% | 96.87% | 4M |
| **[nsfw-detector-mini](https://huggingface.co/viddexa/nsfw-detector-mini)** | **97.90%** | **97.89%** | **17M** |
| [Azure AI](https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety) | 96.79% | 96.57% | N/A |
| [Falconsai](https://huggingface.co/Falconsai/nsfw_image_detection) | 89.52% | 89.32% | 85M |
| Model | F_macro | F_safe | F_porn | F_hentai | F_drawing | F_sexy | Params |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| [nsfw-detection-2-nano](https://huggingface.co/viddexa/nsfw-detection-2-nano) | 93.00% | 96.82% | 96.34% | 93.43% | 93.24% | 85.15% | 4M |
| **[nsfw-detection-2-mini](https://huggingface.co/viddexa/nsfw-detection-2-mini)** | **96.09%** | **98.59%** | **98.05%** | **96.06%** | **96.83%** | **90.92%** | **17M** |
| [nsfw-detection-1-mini](https://huggingface.co/viddexa/nsfw-detection-mini) | N/A | 97.90% | N/A | N/A | N/A | N/A | 17M |
| [Azure AI](https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety) | N/A | 96.79% | N/A | N/A | N/A | N/A | N/A |
| [Falconsai](https://huggingface.co/Falconsai/nsfw_image_detection) | N/A | 89.52% | N/A | N/A | N/A | N/A | 85M |

## 📦 Installation

Expand All @@ -48,7 +47,7 @@ For detailed installation options, see the [Installation Guide](docs/INSTALLATIO
from moderators import AutoModerator

# Load from the Hugging Face Hub (e.g., NSFW image classifier)
moderator = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini")
moderator = AutoModerator.from_pretrained("viddexa/nsfw-detection-2-mini")

# Run on a local image path
result = moderator("/path/to/image.jpg")
Expand All @@ -59,7 +58,7 @@ print(result)

```bash
# Image classification
moderators viddexa/nsfw-detector-mini /path/to/image.jpg
moderators viddexa/nsfw-detection-2-mini /path/to/image.jpg

# Text classification
moderators distilbert/distilbert-base-uncased-finetuned-sst-2-english "I love this!"
Expand All @@ -75,15 +74,21 @@ Moderators normalized JSON output:
[
{
"source_path": "",
"classifications": { "safe": 0.9999891519546509 },
"classifications": { "safe": 0.9998 },
"detections": [],
"raw_output": { "label": "safe", "score": 0.9998 }
},
{
"source_path": "",
"classifications": { "drawing": 0.0001 },
"detections": [],
"raw_output": { "label": "safe", "score": 0.9999891519546509 }
"raw_output": { "label": "drawing", "score": 0.0001 }
},
{
"source_path": "",
"classifications": { "nsfw": 0.000010843970812857151 },
"classifications": { "sexy": 0.0001 },
"detections": [],
"raw_output": { "label": "nsfw", "score": 0.000010843970812857151 }
"raw_output": { "label": "sexy", "score": 0.0001 }
}
]
```
Expand All @@ -103,7 +108,7 @@ Moderators normalized JSON output:

## 🎯 Pick a Model

- **From the Hub**: Pass a model ID like `viddexa/nsfw-detector-mini` or any compatible Transformers model
- **From the Hub**: Pass a model ID like `viddexa/nsfw-detection-2-mini` or any compatible Transformers model
- **From disk**: Pass a local folder that contains a `config.json` next to your weights

Moderators detects the task and integration from the config when possible, so you don't have to specify pipelines manually.
Expand Down
14 changes: 7 additions & 7 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ Results are returned as a list of `PredictionResult` dataclass instances:
[
PredictionResult(
source_path='',
classifications={'NSFW': 0.9821},
classifications={'porn': 0.9821},
detections=[],
raw_output={'label': 'NSFW', 'score': 0.9821}
raw_output={'label': 'porn', 'score': 0.9821}
),
...
]
Expand All @@ -28,9 +28,9 @@ The CLI outputs the same structure as JSON:
[
{
"source_path": "",
"classifications": { "NSFW": 0.9821 },
"classifications": { "porn": 0.9821 },
"detections": [],
"raw_output": { "label": "NSFW", "score": 0.9821 }
"raw_output": { "label": "porn", "score": 0.9821 }
}
]
```
Expand All @@ -43,7 +43,7 @@ Use `dataclasses.asdict()` to convert Python results to JSON-ready dictionaries:
from dataclasses import asdict
from moderators import AutoModerator

moderator = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini")
moderator = AutoModerator.from_pretrained("viddexa/nsfw-detection-2-mini")
result = moderator("/path/to/image.jpg")
json_ready = [asdict(r) for r in result]
print(json_ready)
Expand All @@ -65,7 +65,7 @@ print(json_ready)
```python
from moderators import AutoModerator

moderator = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini")
moderator = AutoModerator.from_pretrained("viddexa/nsfw-detection-2-mini")
```

**From local directory:**
Expand Down Expand Up @@ -105,7 +105,7 @@ Supported tasks:

## Model Selection

- **From the Hub**: Pass a model ID like `viddexa/nsfw-detector-mini` or any compatible Transformers model
- **From the Hub**: Pass a model ID like `viddexa/nsfw-detection-2-mini` or any compatible Transformers model
- **From disk**: Pass a local folder that contains a `config.json` next to your model weights

The system automatically infers the task and integration from the config when possible.
14 changes: 7 additions & 7 deletions docs/CLI.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ moderators <model_id_or_local_dir> <input> [--local-files-only]

### Arguments

- `<model_id_or_local_dir>`: Hugging Face model ID (e.g., `viddexa/nsfw-detector-mini`) or path to local model directory
- `<model_id_or_local_dir>`: Hugging Face model ID (e.g., `viddexa/nsfw-detection-2-mini`) or path to local model directory
- `<input>`: Input data - either a file path (for images) or text string (for text models)
- `--local-files-only` (optional): Force offline mode using cached files only

Expand All @@ -25,13 +25,13 @@ moderators distilbert/distilbert-base-uncased-finetuned-sst-2-english "I love th
### Image Classification

```bash
moderators viddexa/nsfw-detector-mini /path/to/image.jpg
moderators viddexa/nsfw-detection-2-mini /path/to/image.jpg
```

### Offline Mode

```bash
moderators viddexa/nsfw-detector-mini /path/to/image.jpg --local-files-only
moderators viddexa/nsfw-detection-2-mini /path/to/image.jpg --local-files-only
```

## Output Format
Expand All @@ -42,9 +42,9 @@ The CLI prints a JSON array to stdout, making it easy to pipe or parse:
[
{
"source_path": "",
"classifications": { "NSFW": 0.9821 },
"classifications": { "porn": 0.9821 },
"detections": [],
"raw_output": { "label": "NSFW", "score": 0.9821 }
"raw_output": { "label": "porn", "score": 0.9821 }
}
]
```
Expand All @@ -55,9 +55,9 @@ The CLI prints a JSON array to stdout, making it easy to pipe or parse:
- Use `--local-files-only` to ensure no network requests are made
- Pipe output to `jq` for advanced JSON processing:
```bash
moderators viddexa/nsfw-detector-mini image.jpg | jq '.[0].classifications'
moderators viddexa/nsfw-detection-2-mini image.jpg | jq '.[0].classifications'
```
- Redirect output to a file for batch processing:
```bash
moderators viddexa/nsfw-detector-mini image.jpg > results.json
moderators viddexa/nsfw-detection-2-mini image.jpg > results.json
```
2 changes: 1 addition & 1 deletion docs/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
**Suggestions**:

- Use GPU acceleration (see "GPU not used" above)
- Try smaller models (e.g., `nsfw-detector-nano` instead of larger variants)
- Try smaller models (e.g., `nsfw-detection-2-nano` instead of larger variants)
- Consider batch processing for multiple inputs
- Check if auto-installation is downloading dependencies (first run only)

Expand Down
11 changes: 5 additions & 6 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ uv add moderators
from moderators import AutoModerator

# NSFW image classification model from the Hub
moderator = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini")
moderator = AutoModerator.from_pretrained("viddexa/nsfw-detection-2-mini")

# Run on a local image path
result = moderator("/path/to/image.jpg")
Expand All @@ -30,7 +30,7 @@ print(result)

## Quickstart (Image, CLI)
```bash
moderators viddexa/nsfw-detector-mini /path/to/image.jpg
moderators viddexa/nsfw-detection-2-mini /path/to/image.jpg
```

Tip: Add `--local-files-only` to force offline usage if the files are already cached.
Expand All @@ -43,7 +43,7 @@ from pathlib import Path
from moderators import AutoModerator

images_dir = Path("/path/to/images")
model = AutoModerator.from_pretrained("viddexa/nsfw-detector-mini")
model = AutoModerator.from_pretrained("viddexa/nsfw-detection-2-mini")

for img_path in images_dir.glob("**/*"):
if img_path.suffix.lower() in {".jpg", ".jpeg", ".png", ".webp", ".bmp", ".gif", ".avif"}:
Expand Down Expand Up @@ -78,13 +78,12 @@ python examples/benchmarks.py <model_id> <image_path> [--warmup N] [--repeats N]
Examples:
```bash
# Default backend (auto-detected)
python examples/benchmarks.py viddexa/nsfw-detector-mini /path/to/image.jpg --warmup 3 --repeats 20

python examples/benchmarks.py viddexa/nsfw-detection-2-mini /path/to/image.jpg --warmup 3 --repeats 20
```

Expected output (sample):
```
Model: viddexa/nsfw-detector-mini
Model: viddexa/nsfw-detection-2-mini
Backend: auto
Runs: 20, avg: 12.34 ms, p50: 11.80 ms, p90: 14.10 ms
```
Expand Down
2 changes: 1 addition & 1 deletion src/moderators/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.1.2"
__version__ = "0.1.3"

from .auto_model import AutoModerator

Expand Down
13 changes: 7 additions & 6 deletions src/moderators/integrations/transformers_moderator.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

import importlib
import inspect
import json
import sys
from pathlib import Path
Expand All @@ -21,8 +22,7 @@ class TransformersModerator(BaseModerator):
"""Moderator implementation using HuggingFace Transformers."""

def load_model(self) -> None:
"""
Build a transformers pipeline deterministically:
"""Build a transformers pipeline deterministically:
- Validate task.
- Ensure deps (transformers, DL framework, Pillow for image tasks).
- Try AutoProcessor (if local `preprocessor_config.json` exists).
Expand Down Expand Up @@ -119,10 +119,13 @@ def load_model(self) -> None:
if tokenizer is not None:
pipe_kwargs["tokenizer"] = tokenizer

# Pass framework for transformers 4.x; omit for 5.x+ where it was removed
if "framework" in inspect.signature(pipeline).parameters:
pipe_kwargs["framework"] = framework

self._pipe = pipeline(
task,
model=model_id,
framework=framework,
**pipe_kwargs,
)

Expand Down Expand Up @@ -160,9 +163,7 @@ def _postprocess(self, model_outputs: Any) -> list[PredictionResult]:
return results

def save_pretrained(self, save_directory: str, **kwargs: Any) -> str:
"""Saves model + tokenizer + (processor / image_processor / feature_extractor) and refreshes/creates a
config.json with required moderator metadata.
"""
"""Save model artifacts and update config.json with moderator metadata."""
out_dir = Path(save_directory)
out_dir.mkdir(parents=True, exist_ok=True)

Expand Down
77 changes: 75 additions & 2 deletions tests/test_auto_model.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
import base64
import json
import re
import sys
import types
import urllib.request
from pathlib import Path

import pytest
Expand Down Expand Up @@ -84,7 +88,76 @@ def test_missing_config_json_raises(tmp_path):
AutoModerator.from_pretrained(str(tmp_path))


def test_hf_model_falconsai_nsfw_image_detection_integration_online(tmp_path):
def test_pipeline_framework_not_passed_on_real_transformers():
"""Verify inspect-based framework detection works with the installed transformers."""
import inspect

from transformers import pipeline

has_framework = "framework" in inspect.signature(pipeline).parameters
# transformers 5.x removed the framework param; 4.x had it
# Either way, our code in TransformersModerator.load_model checks this at runtime
assert isinstance(has_framework, bool)


def test_pipeline_framework_passed_for_v4_signature(tmp_path, monkeypatch):
"""Verify framework kwarg IS passed when the pipeline function accepts it (v4 behavior)."""
captured = {}

def fake_pipeline_v4(task, model=None, framework=None, **kwargs):
captured["framework"] = framework
return lambda inputs: {"label": "OK", "score": 0.9}

mod = types.ModuleType("transformers")
mod.pipeline = fake_pipeline_v4
monkeypatch.setitem(sys.modules, "transformers", mod)

model_dir = write_config(tmp_path, {"architecture": "TransformersModerator", "task": "text-classification"})
m = AutoModerator.from_pretrained(str(model_dir))
out = m("hello")

assert isinstance(out, list) and len(out) == 1
assert captured.get("framework") is not None, "framework should be passed to v4-style pipeline"


def test_pipeline_framework_omitted_for_v5_signature(tmp_path, monkeypatch):
"""Verify framework kwarg is NOT passed when the pipeline rejects it (v5 behavior)."""

def fake_pipeline_v5(task, model=None, **kwargs):
if "framework" in kwargs:
raise TypeError("unexpected keyword argument 'framework'")
return lambda inputs: {"label": "OK", "score": 0.9}

mod = types.ModuleType("transformers")
mod.pipeline = fake_pipeline_v5
monkeypatch.setitem(sys.modules, "transformers", mod)

model_dir = write_config(tmp_path, {"architecture": "TransformersModerator", "task": "text-classification"})
m = AutoModerator.from_pretrained(str(model_dir))
out = m("hello")

assert isinstance(out, list) and len(out) == 1


_REPO_ROOT = Path(__file__).resolve().parents[1]
_HF_URLS = sorted(
{
url
for md in [*_REPO_ROOT.glob("*.md"), *_REPO_ROOT.glob("docs/*.md"), *_REPO_ROOT.glob("examples/*.md")]
for url in re.findall(r"https://huggingface\.co/[\w-]+/[\w-]+", md.read_text())
}
)


@pytest.mark.parametrize("url", _HF_URLS)
def test_hf_model_links_valid(url):
"""Verify HuggingFace URLs found in markdown docs are not broken."""
req = urllib.request.Request(url, method="HEAD")
resp = urllib.request.urlopen(req, timeout=10)
assert resp.status == 200


def test_hf_model_viddexa_nsfw_detection_integration_online(tmp_path):
# If HF Hub is offline, skip
try:
from huggingface_hub.utils import is_offline_mode
Expand All @@ -100,7 +173,7 @@ def test_hf_model_falconsai_nsfw_image_detection_integration_online(tmp_path):
if str(os.environ.get("MODERATORS_DISABLE_AUTO_INSTALL", "")).lower() in ("1", "true", "yes"):
pytest.skip("Auto-install disabled; skipping online integration test.")

model_id = "Falconsai/nsfw_image_detection"
model_id = "viddexa/nsfw-detection-2-mini"
mod = AutoModerator.from_pretrained(model_id, local_files_only=False)
assert isinstance(mod, TransformersModerator)

Expand Down