Skip to content

html to markdown ,Generate data twice #428

@caoshl

Description

@caoshl
  • Version by html2text --version
  • Test script
  • Python version `python --version

get html from
https://www.stats.gov.cn/sj/zxfb/202409/t20240914_1956486.html
in the download html file ,only one data
but in the mardkown text ,data twice

script :
def convert_html_to_markdown(html_text,base_url:str):
h = HTML2Text(baseurl=base_url)
h.ignore_links = False
markdown_text = h.handle(html_text)
return markdown_text

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions