A node.js tool to extract html posts from webpages using puppeteer , extract them to markdown and save them.
I haven't really tested this and there are many things missing, but it works for my use case.
- Clone the repository
- Run npm install
node index.js --url="https://justmarkup.com" --postSelector=".main .article h2 a" --titleSelector=".article h1" --contentSelector=".article .entry-content" --dir="/posts/"| Option | Default | Description | 
|---|---|---|
| --url | https://justmarkup.com | The entry page containing links to the posts | 
| --postSelector | .main .article h2 a | The selector for all the links to your posts | 
| --titleSelector | .article h1 | The selector for the title of your post | 
| --contentSelector | .article .entry-content | The selector for the content wrapper of your post | 
| --dir | /posts/ | The directory where the posts should be saved |