Add trafilatura as alternative to readability / mercury / html2text / defuddle

Add a plugin similar to the readability / mercury aka postlight-parser / html2text ones, but using instead:

https://github.com/adbar/trafilatura


We dont need it's crawling/discovery features, only the single url in -> extract output features. Ideally it should expose env vars to allow toggling the various outputs it supports, including:

- markdown
- CSV
- html
- plain text
- any others that might be useful

We should wire it up to take in the existing html extracted by the singlefile output, chrome dom output, wget output, etc. similar to readability / mercury instead of re-downloading the page from scratch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trafilatura as alternative to readability / mercury / html2text / defuddle #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add trafilatura as alternative to readability / mercury / html2text / defuddle #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions