A Python toolkit and web API for detecting “bad” content (profanity, hate speech, gore, etc.) in text using a zero‑shot AI classifier.
- Zero‑shot classification with Hugging Face’s
facebook/bart-large-mnli. - CLI tool (
bad_text_detector.py) to scan single strings, files, or directories. - FastAPI web service (
SafeTextContentApi.py) with GET/POST/detectand POST/detect/file. - Customizable labels, thresholds, cache directories, CORS, and “is_safe” flag.
- Test script (
test_api.py) with colorful console output.
- Python 3.8+
pip
-
Clone the repo
git clone https://github.com/im-syn/SafeContentText.git cd SafeContentText -
Create & activate a virtual environment
python3 -m venv venv source venv/bin/activate # on Windows: venv\Scripts\activate
-
Install dependencies
pip install fastapi uvicorn transformers torch requests colorama
The following environment variables let you override defaults:
| Variable | Default | Description |
|---|---|---|
STC_API_HOST |
127.0.0.1 |
Host for the FastAPI server |
STC_API_PORT |
8989 |
Port for the FastAPI server |
STC_API_RELOAD |
True |
Whether uvicorn runs with --reload |
HF_CACHE_DIR |
<project‑root>/hf_cache |
Where HuggingFace models are downloaded/cached |
To change, e.g.:
export HF_CACHE_DIR=/data/models/zero_shot
export STC_API_HOST=0.0.0.0
export STC_API_PORT=8080# Single text
python bad_text_detector.py --text "I hate you"
# File
python bad_text_detector.py --file comments.txt
# Directory of .txt files
python bad_text_detector.py --dir ./logs/
# Custom labels & threshold
python bad_text_detector.py -t "Example" -l "profanity,insult" -T 0.6
# Save JSON output
python bad_text_detector.py -t "Test" -o results.json
# Verbose logging
python bad_text_detector.py -t "Test" -v-
Start the server
python SafeTextContentApi.py # or uvicorn SafeTextContentApi:app --host localhost --port 8989 --reload -
Endpoints
-
GET
/detect-
Query params:
texts=first&texts=second- or
text=single(ifENABLE_TEXT_PARAM=True) labels=insult&labels=profanitythreshold=0.5
-
-
POST
/detect-
JSON body:
{ "texts": ["one", "two"], "labels": ["hate speech","insult"], "threshold": 0.6 }
-
-
POST
/detect/file-
Multipart form:
file: text/plain (.txt)labels: as form fieldsthreshold: as form field
-
-
Run:
python test_api.pyIt will exercise all three endpoints and print colored results.
GET http://localhost:8989/detect?text=sex
{
"results": [
{
"text": "sex",
"scores": {
"sexual content": 0.999351501464844,
"profanity": 0.937217891216278,
"graphic violence": 0.639289915561676,
"insult": 0.146385952830315,
"hate speech": 0.00694961939007044,
"self-harm": 0.00394586473703384,
"terrorism": 0.000741451571229845
},
"flagged_labels": {
"sexual content": 0.999351501464844,
"profanity": 0.937217891216278,
"graphic violence": 0.639289915561676
},
"is_safe": false
}
]
}GET http://localhost:8989/detect?text=i%20love%20cats
{
"results": [
{
"text": "i love cats",
"scores": {
"sexual content": 0.00827411003410816,
"profanity": 0.00117546890396625,
"graphic violence": 0.00117279822006822,
"insult": 0.000300033862004056,
"self-harm": 0.000142173827043734,
"hate speech": 0.000105988452560268,
"terrorism": 0.000031683866836829
},
"flagged_labels": {},
"is_safe": true
}
]
}| Line | Code | Explanation |
|---|---|---|
| 1 | #!/usr/bin/env python3 |
Shebang: run with system’s Python 3 interpreter. |
| 2–9 | """ … """ |
Module docstring describing purpose & endpoints. |
| 11–14 | import … |
Import standard libraries and FastAPI, pydantic, transformers. |
| 17–23 | Configurable options (ALLOW_GET, HOST, CACHE_DIR, …) |
Toggle GET/POST/file endpoints, CORS, host/port, reload, cache directory. |
| 26–34 | DEFAULT_LABELS = [...] |
Default categories to detect. |
| 37–43 | logging.basicConfig(...) |
Configure global logging format & level. |
| 46 | os.makedirs(CACHE_DIR, exist_ok=True) |
Ensure cache directory exists. |
| 49–54 | classifier = pipeline(...) |
Load zero‑shot model into CACHE_DIR once at startup. |
| 57–68 | class DetectRequest(BaseModel): ... |
Pydantic model for JSON POST requests. |
| 70–78 | class DetectResult(BaseModel): ... |
Pydantic model for each classification result (includes is_safe if enabled). |
| 80–83 | class DetectResponse(BaseModel): ... |
Pydantic model wrapping List[DetectResult]. |
| 86–93 | app = FastAPI(...) |
Create FastAPI instance with metadata. |
| 96–103 | if ENABLE_CORS: app.add_middleware(CORSMiddleware,…) |
Conditionally enable CORS. |
| 106–124 | def classify_texts(...): |
Helper to run zero‑shot, compute flagged_labels, and is_safe. |
| 127–136 | Error handlers for validation, HTTPException, general exceptions. | Return consistent JSON error responses. |
| 139–147 | @app.post("/detect") |
POST /detect endpoint, reads JSON body, calls classify_texts. |
| 149–166 | @app.get("/detect") |
GET /detect, supports texts list or single text param. |
| 168–180 | @app.post("/detect/file") |
File‐upload endpoint, reads .txt, splits lines into separate texts. |
| 183–190 | Uvicorn runner block (if __name__ == "__main__": uvicorn.run(...)). |
Allows python SafeTextContentApi.py to start the server. |
| Line | Code | Explanation |
|---|---|---|
| 1 | #!/usr/bin/env python3 |
Shebang for CLI usage. |
| 2–9 | Module docstring describing CLI functionality. | |
| 11–18 | import … |
Standard libs + transformers.pipeline. |
| 21–28 | DEFAULT_LABELS |
Categories to detect by default. |
| 30–38 | configure_logging(...) |
Sets up timestamped, leveled console logs. |
| 40–51 | load_texts_from_dir(directory) |
Recursively collects .txt files’ contents. |
| 53–62 | detect_bad_content(classifier, texts, labels) |
Runs zero‑shot and normalizes output to a list. |
| 64–106 | main() function: |
|
| 66–75 | ‑ Parse CLI arguments (--text, --file, --dir, --labels, etc.). |
|
| 77–91 | ‑ Build inputs dict from string, file, or directory. |
|
| 93 | ‑ Load zero‑shot model: pipeline("zero-shot-classification", …). |
|
| 95 | ‑ Call detect_bad_content(...). |
|
| 97–104 | ‑ Print flagged vs safe texts, optionally write --output JSON file. |
|
| 108 | if __name__ == "__main__": main() |
CLI entry point. |
| Line | Code | Explanation |
|---|---|---|
| 1 | #!/usr/bin/env python3 |
Shebang for CLI. |
| 2–9 | Module docstring explaining it tests all endpoints. | |
| 11–13 | import …: requests, colorama for colored output. |
|
| 16 | BASE_URL = "http://localhost:8989" |
Base address for API. |
| 18–22 | pretty_print(title, color): prints a colored separator & title. |
|
| 24–41 | test_get(): calls GET /detect?, prints JSON or error in color. |
|
| 43–60 | test_post_json(): POST /detect with JSON, prints result. |
|
| 62–83 | test_post_file(): POST /detect/file with a temp .txt file, prints result & cleans up. |
|
| 85–89 | if __name__ == "__main__": runs all three tests. |
- Fork the repo
- Create your feature branch (
git checkout -b feature/XYZ) - Commit your changes (
git commit -m "Add XYZ") - Push (
git push origin feature/XYZ) - Open a Pull Request
MIT © SYN
If this helped you, consider giving the repo a 🌟 or forking it to your toolkit. Thank you for using SafeContentText! Feel free to open issues or PRs for improvements.