Skip to content

Ticket: Develop Local LLMs and Validation Content for Document Intelligence #3

@gcziprusz

Description

@gcziprusz

Create instructional content and hands-on exercises for Week 2, Session 2 and Week 3, Session 1 of the Document Intelligence course. This work introduces students to local language models for document parsing and the validation strategies required to make model output reliable in production-like pipelines.

Content must follow the Content Development Guide and align to the established Learning Objectives (LOs).

📘 Content Development Guide
https://docs.google.com/document/d/1kZkrEpwPW0UkHe0kvl7tEVbKe7cLH622wTnoF_XNb40/edit?tab=t.kpgyevktwssa#bookmark=id.2bxrvou21vrt

🎯 Relevant Learning Objectives (Reference)
https://docs.google.com/document/d/1kZkrEpwPW0UkHe0kvl7tEVbKe7cLH622wTnoF_XNb40/edit?tab=t.kpgyevktwssa#bookmark=id.hgmn7xwnkkmt


Scope of Work

Week 2, Session 2: Working with Local Language Models

Concept Content

  • Explain what a language model is and how it differs from rule-based and traditional NLP approaches
  • Introduce local language models and the benefits/tradeoffs compared to API-based models
  • Demonstrate how to run local models using HuggingFace and the transformers library
  • Teach prompt techniques with an emphasis on:
    • Instruction clarity
    • Specifying structured output (JSON)

Hands-On Exercise

  • Guide students through using a local language model to parse invoice text
  • Require the model to output structured JSON
  • Include examples of imperfect or inconsistent model output for discussion

LO Alignment

  • Explain language model fundamentals
  • Apply local LLMs to document extraction
  • Design prompts that enforce structured output
  • Analyze LLM output quality

Week 3, Session 1: Validation

Concept Content

  • Explain why validation is necessary when working with model-generated data
  • Introduce schema validation concepts (required fields, data types, constraints)
  • Explain retry strategies for handling invalid or incomplete outputs

Hands-On Exercise

  • Implement schema validation for invoice JSON output
  • Build a retry mechanism that:
    • Detects invalid output
    • Re-prompts or re-runs the model
  • Encourage students to reason about failure modes

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions