Add a plugin similar to the readability / mercury aka postlight-parser / html2text ones, but using instead:
https://github.com/adbar/trafilatura
We dont need it's crawling/discovery features, only the single url in -> extract output features. Ideally it should expose env vars to allow toggling the various outputs it supports, including:
- markdown
- CSV
- html
- plain text
- any others that might be useful
We should wire it up to take in the existing html extracted by the singlefile output, chrome dom output, wget output, etc. similar to readability / mercury instead of re-downloading the page from scratch.