Skip to content

Basic method: LLM quality control without HTML content comparison #121

@PaulaMerle

Description

@PaulaMerle

The basic way of evaluating main body extraction is giving the LLM the extracted text and instructing it to evaluate if the text is clean, it is separated well and understandable.

Acceptance criteria:

  • first check is made without adding any html from the webpage, just evaluating the content completeness
  • If the result is negative, html is added and LLM completes the extracted content based on the original web page

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

In Progress

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions