Ticket: Develop Local LLMs and Validation Content for Document Intelligence

Create instructional content and hands-on exercises for **Week 2, Session 2** and **Week 3, Session 1** of the Document Intelligence course. This work introduces students to **local language models** for document parsing and the **validation strategies** required to make model output reliable in production-like pipelines.

Content must follow the **Content Development Guide** and align to the established **Learning Objectives (LOs)**.

📘 **Content Development Guide**  
https://docs.google.com/document/d/1kZkrEpwPW0UkHe0kvl7tEVbKe7cLH622wTnoF_XNb40/edit?tab=t.kpgyevktwssa#bookmark=id.2bxrvou21vrt  

🎯 **Relevant Learning Objectives (Reference)**  
https://docs.google.com/document/d/1kZkrEpwPW0UkHe0kvl7tEVbKe7cLH622wTnoF_XNb40/edit?tab=t.kpgyevktwssa#bookmark=id.hgmn7xwnkkmt  

---

### Scope of Work

#### Week 2, Session 2: Working with Local Language Models
**Concept Content**
- Explain what a language model is and how it differs from rule-based and traditional NLP approaches
- Introduce local language models and the benefits/tradeoffs compared to API-based models
- Demonstrate how to run local models using HuggingFace and the `transformers` library
- Teach prompt techniques with an emphasis on:
  - Instruction clarity
  - Specifying structured output (JSON)

**Hands-On Exercise**
- Guide students through using a local language model to parse invoice text
- Require the model to output structured JSON
- Include examples of imperfect or inconsistent model output for discussion

**LO Alignment**
- Explain language model fundamentals  
- Apply local LLMs to document extraction  
- Design prompts that enforce structured output  
- Analyze LLM output quality  

---

#### Week 3, Session 1: Validation
**Concept Content**
- Explain why validation is necessary when working with model-generated data
- Introduce schema validation concepts (required fields, data types, constraints)
- Explain retry strategies for handling invalid or incomplete outputs

**Hands-On Exercise**
- Implement schema validation for invoice JSON output
- Build a retry mechanism that:
  - Detects invalid output
  - Re-prompts or re-runs the model
- Encourage students to reason about failure modes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ticket: Develop Local LLMs and Validation Content for Document Intelligence #3

Scope of Work

Week 2, Session 2: Working with Local Language Models

Week 3, Session 1: Validation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ticket: Develop Local LLMs and Validation Content for Document Intelligence #3

Description

Scope of Work

Week 2, Session 2: Working with Local Language Models

Week 3, Session 1: Validation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions