Skip to content

Ampersand in link causes different markdown to be generated vs. non-ampersand link #441

@lost-theory

Description

@lost-theory

Version: html2text==2025.4.15.

I want to rely on the behavior where <a href="http://link/">http://link/</a> gets rendered in Markdown as <http://link/>.

Unfortunately if the link contains an ampersand it renders using [link](link), despite the link and text being exactly the same:

>>> from html2text import HTML2Text as H
>>> h = H()

# good!
>>> h.handle('<a href="https://foo/">https://foo/</a>')
'<https://foo/>\n\n'

# an ampersand in the URL causes []() link syntax, but I guess that's expected since "&y" turns into "&y;"
>>> h.handle('<a href="https://foo/?x=1&y=2">https://foo/?x=1&y=2</a>')
'[https://foo/?x=1&y;=2](https://foo/?x=1&y=2)\n\n'

# but even when &amp; is escaped, and both link and text end up being exactly the same string, []() is still generated for the link
>>> h.handle('<a href="https://foo/?x=1&amp;y=2">https://foo/?x=1&amp;y=2</a>')
'[https://foo/?x=1&y=2](https://foo/?x=1&y=2)\n\n'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions