Skip to content

Wikipedia regexp can produce malformed HTML #3

@acdha

Description

@acdha

The stock wikipedia rule is:

http://\S*.wikipedia.org/wiki/\S*

I encountered problems where correct markup such as:

<a href="http://en.wikipedia.org/wiki/Bookmarklet">bookmarklet</a>.

was incorrectly converted into something like this, which is either silently ignored by browsers or results in all of the output until the next </a> being ignored, jumbled, etc:

<a href="http://en.wikipedia.org/wiki/Bookmarklet">bookmarklet</a.>

I'm currently using this on my site but it probably needs further testing to avoid other silent failures:

http://\S*.wikipedia.org/wiki/[^ .<]+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions