- Version by
html2text --version
- Test script
- Python version `python --version
get html from
https://www.stats.gov.cn/sj/zxfb/202409/t20240914_1956486.html
in the download html file ,only one data
but in the mardkown text ,data twice
script :
def convert_html_to_markdown(html_text,base_url:str):
h = HTML2Text(baseurl=base_url)
h.ignore_links = False
markdown_text = h.handle(html_text)
return markdown_text