Skip to content

Conversation

@kelockhart
Copy link
Member

Previously, we were passing MathML markup along in the export, without any special handling. This is especially problematic in LaTeX formats (including BibTeX), because the markup breaks the LaTeX compilation on the user's end.

Example (2020EPJC...80...96D; 2024EPJC...84..487P is another example):

@ARTICLE{Dhaygude_2020-2020,
       author = {{Dhaygude}, Akanksha and {Desai}, Shantanu},
        title = "{Generalized Lomb{\textendash}Scargle analysis of <inline-formula id=``IEq1''><mml:math><mml:mrow><mml:msup><mml:mrow></mml:mrow><mml:mn>36</mml:mn></mml:msup><mml:mi mathvariant=``normal''>Cl</mml:mi></mml:mrow></mml:math></inline-formula> decay rate measurements at PTB and BNL}",
      journal = {EPJC},
     keywords = {Astrophysics - High Energy Astrophysical Phenomena, Astrophysics - Instrumentation and Methods for Astrophysics, Nuclear Experiment},
         year = 2020,
        month = feb,
       volume = {80},
       number = {2},
          eid = {96},
        pages = {96},
          doi = {10.1140/epjc/s10052-020-7683-6},
archivePrefix = {arXiv},
       eprint = {1912.06970},
 primaryClass = {astro-ph.HE},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2020EPJC...80...96D},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

This PR does the following:

  • for LaTeX formats (including BibTeX), the MathML markup is converted to LaTeX markup (note: this is hard to see in the code, but it's anything that uses the encode_laTex function)
  • for tagged formats (excluding BibTeX), XML formats, and IEEE, the MathML markup is converted to plain text
  • for text formats (excluding IEEE), the MathML markup is left as-is, as in the UI those are rendered and display properly

Notes:

  • If we decide to render IEEE as we do the other text formats, we should remove the MathML to plain text conversion.
  • the JATS-XML standard can handle MathML markup, but I converted it to plain text anyway, as I'm not sure of the end use case, to be consistent with the other XML formats. We could remove the conversion, but would need to do some additional work as the serialization process was converting all of the MathML markup to use HTML entities instead (hence we weren't producing usable markup in this format anyway).

@kelockhart kelockhart requested a review from tjacovich July 2, 2025 21:29
@coveralls
Copy link

coveralls commented Jul 2, 2025

Coverage Status

coverage: 96.662% (-0.4%) from 97.106%
when pulling bcfa8ba on kelockhart:mathml
into 83521f8 on adsabs:master.

Copy link

@tjacovich tjacovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this seems reasonable, just one question about where the LaTeX converter lives.

@kelockhart kelockhart merged commit 8edaa23 into adsabs:master Jul 3, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants