-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Create instructional content and hands-on exercises for Week 2, Session 2 and Week 3, Session 1 of the Document Intelligence course. This work introduces students to local language models for document parsing and the validation strategies required to make model output reliable in production-like pipelines.
Content must follow the Content Development Guide and align to the established Learning Objectives (LOs).
📘 Content Development Guide
https://docs.google.com/document/d/1kZkrEpwPW0UkHe0kvl7tEVbKe7cLH622wTnoF_XNb40/edit?tab=t.kpgyevktwssa#bookmark=id.2bxrvou21vrt
🎯 Relevant Learning Objectives (Reference)
https://docs.google.com/document/d/1kZkrEpwPW0UkHe0kvl7tEVbKe7cLH622wTnoF_XNb40/edit?tab=t.kpgyevktwssa#bookmark=id.hgmn7xwnkkmt
Scope of Work
Week 2, Session 2: Working with Local Language Models
Concept Content
- Explain what a language model is and how it differs from rule-based and traditional NLP approaches
- Introduce local language models and the benefits/tradeoffs compared to API-based models
- Demonstrate how to run local models using HuggingFace and the
transformerslibrary - Teach prompt techniques with an emphasis on:
- Instruction clarity
- Specifying structured output (JSON)
Hands-On Exercise
- Guide students through using a local language model to parse invoice text
- Require the model to output structured JSON
- Include examples of imperfect or inconsistent model output for discussion
LO Alignment
- Explain language model fundamentals
- Apply local LLMs to document extraction
- Design prompts that enforce structured output
- Analyze LLM output quality
Week 3, Session 1: Validation
Concept Content
- Explain why validation is necessary when working with model-generated data
- Introduce schema validation concepts (required fields, data types, constraints)
- Explain retry strategies for handling invalid or incomplete outputs
Hands-On Exercise
- Implement schema validation for invoice JSON output
- Build a retry mechanism that:
- Detects invalid output
- Re-prompts or re-runs the model
- Encourage students to reason about failure modes