Skip to content

Conversion from block mark-up to/from Markdown #11

@vadimkantorov

Description

@vadimkantorov

Feature Request

Describe your use case and the problem you are facing

Export from existing WordPress installation's Gutenberg pages to Markdown for interop with other content engines or to LLMs which tend to like Markdown. Import would also be great. My usecase is to convert to Markdown the block markup syntax strings found in WXR files exported from an existing WP installation.

Describe the solution you'd like

Versatile / configurable export from block markup to Markdown. Custom attributes can be optionally exported via CommonMark Directive syntax which already has support in remark. Even if conversion is not 100% faithful, it is already useful: e.g. for making existing WordPress content consumable in good form for RAG / LLMs.

There are a lot of various existing markdown plugins for WordPress, but there is a need for a well-tested official one (which will be maintained, currently maintenance is also fragmented, there are plenty of abandoned exporters), but versatile one which can be scripted, so it's a good fit for CLI command, in my opinion.

Ideally, there must be handlers for standard block types, and some catch-all for optionally exporting verbatim HTML for the blocks which do not have registered markdown converter. Having some HTML in Markdown is also fine, as the user can then postproc the markdown files with remark/rehype pipelines.

IMO such exporter should accept that there does not exist a single right way to export to Markdown so some configurability is to be expected, as users may want to get different flavors of Markdown (e.g. both preserving the text color as Directives, or just dropping it altogether are good and useful, depending on the user's usecase).

Related:

For now, I had to roll my own https://github.com/vadimkantorov/wxr for exporting block markup from WXR to HTML files (preserving attributes), and then for converting them to Markdown (and also allowing some custom URL / image URL postprocessing / downloading the linked images etc). In this converter I'm often representing unknown blocks as:

---
blah blah some info about a block without a standard way to represent in Markdown (not frontmatter)
---

more markdown

So essentially using a pair of --- dividers to enclose some block info. Then a custom remark/rehype pipeline could interpret / render these in a custom way if a user desires so.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions