Skip to content
5 changes: 4 additions & 1 deletion app/celery_setup/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@

app = Celery('llm')
app.conf.update(**generate_config_from_env())
app.autodiscover_tasks(packages=['topic_prompt'])
app.autodiscover_tasks(packages=['topic_prompt',
'commentary_scoring']
)

215 changes: 215 additions & 0 deletions app/commentary_scoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# CommentaryScorer β€” Commentary–Citation Analysis Tool

**CommentaryScorer** is a Python tool that uses **LLMs** to analyze a **commentary** and determine, for **each cited base text**, whether the commentary actually **explains/interprets** it. It returns a **binary score (0/1)** per citation together with a **short rationale**.

---

## ⭐ Scores Extracted

- **Per-Citation Explanation Score**: `0` = not explained, `1` = explained
- **Per-Citation Rationale**: short reason string that begins with
`Explained spans: '<phrase1>'; '<phrase2>'` (or `'None'`)

---

## πŸš€ Quick Start

```python
from commentary_scoring.commentary_scoring import score_one_commentary
from sefaria_llm_interface.commentary_scoring import CommentaryScoringInput

inp = CommentaryScoringInput(
commentary_ref="Rashi on Genesis 1:1",
cited_refs={
"Genesis 1:1": "In the beginning God created the heavens and the earth.",
"Genesis 1:2": "Now the earth was formless and void..."
},
commentary_text="""
Rashi on 'בראשיΧͺ' explains sequencing/purpose and interprets terms...
"""
)

out = score_one_commentary(inp)

print("Scores:", out.ref_scores)
print("Reasons:", out.scores_explanation)

```

## πŸ“¦ Data Structures

### **Input β€” `CommentaryScoringInput`**

```python
{
"commentary_ref": "Rashi on Genesis 1:1", # Optional string identifier
"cited_refs": { # Dict of citation β†’ base text
"Genesis 1:1": "In the beginning ...",
"Genesis 1:2": "Now the earth ..."
},
"commentary_text": "Full commentary text (plain or HTML)"
}
```

- **commentary_ref**: identifier for the commentary (helpful for logging)
- **cited_refs**: dictionary mapping citation keys (e.g., `"Genesis 1:1"`) to their textual content
- **commentary_text**: commentary body text (string, can contain HTML, nested lists, etc.)

---

### **Output β€” `CommentaryScoringOutput`**

```python
{
"commentary_ref": "Rashi on Genesis 1:1",
"ref_scores": { "Genesis 1:1": 1, "Genesis 1:2": 0 },
"scores_explanation": {
"Genesis 1:1": "Explained spans: 'בראשיΧͺ'; 'ΧΧœΧ•Χ§Χ™Χ' β€” Adds interpretive content ...",
"Genesis 1:2": "Explained spans: None β€” Only a decorative quote, no interpretation ..."
},
"processed_datetime": "2025-08-19T10:30:00Z",
"request_status": 1,
"request_status_message": ""
}
```

- **ref_scores**: dictionary of binary scores per citation (0 = not explained, 1 = explained)
- **scores_explanation**: dictionary of rationales per citation, each beginning with **β€œExplained spans”**
- **processed_datetime**: UTC ISO8601 timestamp when scoring was done
- **request_status**: `1 = success`, `0 = failure`
- **request_status_message**: error description in case of failure

---

## βš™οΈ Scoring System

### **Architecture**

The `commentary_scoring` package consists of:

- `commentary_scoring.py` β€” Main API with `score_one_commentary()`
- `openai_commentary_scorer.py` β€” Core LLM engine (`CommentaryScorer`)
- `tasks.py` β€” Celery task wrapper for async processing
- `text_utils.py` β€” Utilities for HTML stripping and flattening
- `README.md` β€” Documentation


---

### **Explanation Levels**

| Level | Description |
|-------|-------------|
| **0 β€” Not Explained** | Commentary does not interpret the cited text (decorative prooftext, paraphrase only, inherited interpretation). |
| **1 β€” Explained** | Commentary provides interpretation or explanation of any part of the cited text. |

---

## 🧠 Algorithm

### **Input Validation**
- Fail if `cited_refs` is empty or `commentary_text` is missing
- Token counting via `tiktoken` (fallback = character length)
- If too long β†’ fail fast with `"input too long"`

### **Build Prompt**
- Commentary text + cited refs in structured sections
- Explicit instructions for binary labeling per citation
- Require **β€œExplained spans”** prefix in explanations

### **Schema Enforcement**
- OpenAI function calling schema requires:
- `ref_scores`: dict of citation β†’ 0/1
- `explanation`: dict of citation β†’ rationale string

### **LLM Invocation**
- Config: `gpt-4o-mini`, `temperature=0`, `top_p=0`, `seed=42`
- Parse structured JSON output

### **Post-Processing**
- Clamp invalid values to `0` or `1`
- Return `CommentaryScoringOutput`

---

## πŸ”§ Configuration Options

### **Initialization**

```python
from commentary_scoring.openai_commentary_scorer import CommentaryScorer

scorer = CommentaryScorer(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini", # default model
max_prompt_tokens=32000, # max tokens for input prompt
token_margin=4096 # reserved for model response
)
```

- **API Key**: via `OPENAI_API_KEY` environment variable or explicit parameter
- **Model**: defaults to `gpt-4o-mini`, override if needed
- **Token Guardrails**: ensures commentary fits within prompt budget

---

## πŸ“œ Celery Integration

### **Task Wrapper**

```python
@shared_task(name='llm.score_commentary')
def score_sheet_task(raw_input: dict) -> dict:
inp = CommentaryScoringInput(**raw_input)
out = score_one_commentary(inp)
return asdict(out)
```

### **Usage**

```python
from celery import signature

payload = {
"commentary_ref": "Rashi on Genesis 1:1",
"cited_refs": {"Genesis 1:1": "...", "Genesis 1:2": "..."},
"commentary_text": "Rashi explains ..."
}
sig = signature("llm.score_commentary", args=[payload], queue="llm")
print(sig.apply_async().get())
```

---

## πŸ“Š Output Fields

| Field | Description |
|------------------------|--------------------------------------------------|
| `ref_scores` | Binary 0/1 scores per citation |
| `scores_explanation` | Rationale strings beginning with `"Explained spans"` |
| `commentary_ref` | Commentary identifier |
| `processed_datetime` | UTC ISO8601 timestamp |
| `request_status` | `1 = success`, `0 = failure` |
| `request_status_message` | Error message if failure |

---

## πŸ“ Logging

- **Info**: token count, number of citations, success summary
- **Warning**: invalid scores clamped, tokenizer fallback
- **Error**: LLM or JSON parse failures

```python
import logging
logging.getLogger("commentary_scoring").setLevel(logging.INFO)
```

---

## βœ… Extensibility

- By now there is no support for very long commentaries, because during testing I didn't encounter any. The chances are high that we won't need this feature at all -- but the matter should be explored.

---

Empty file.
17 changes: 17 additions & 0 deletions app/commentary_scoring/commentary_scoring.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from .openai_commentary_scorer import CommentaryScorer
import os
from sefaria_llm_interface.commentary_scoring import (
CommentaryScoringInput,
CommentaryScoringOutput,
)

def score_one_commentary(inp: CommentaryScoringInput) -> (
CommentaryScoringOutput):
scorer = CommentaryScorer(
api_key=os.getenv("OPENAI_API_KEY")
)
return scorer.process_commentary_by_content(
commentary_ref=inp.commentary_ref,
cited_refs=inp.cited_refs,
commentary_text=inp.commentary_text
)
Loading