An XSLT library to normalize MathML equations with heuristic methods.
The authoring of math equations is an error-prone process, especially with WYSIWYG editors such as MathType and Microsoft Word Equation Editor. For example, authors tend to write symbols accidentally in text mode instead of changing the font-style to normal. This results most likely in wrong MathML markup for the symbol, like mtext where mi is appropriate.
Consider this MathML equation with wrong markup:
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mtext>E=m</mtext>
<msubsup>
<mtext>c</mtext>
<mi> </mi>
<mn>2</mn>
</msubsup>
</math>After mml-normalize, the mtext was resolved and the text was properly tagged with mi and mo elements. Furthermore, the msubsup was replaced with msup:
<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<mrow>
<mi mathvariant="normal">E</mi>
<mo>=</mo>
<mi mathvariant="normal">m</mi>
</mrow>
<msup>
<mi mathvariant="normal">c</mi>
<mn>2</mn>
</msup>
</math>mml-normalize contains two XSLT modes. First, you should invoke mml2tex-grouping and afterwards mml2tex-preprocess.
$ saxon -xsl:mml-normalize/xsl/mml-normalize.xsl -s:eq.xml -o:eq-nrmlzd.xml -im:mml2tex-grouping
$ saxon -xsl:mml-normalize/xsl/mml-normalize.xsl -s:eq.xml -o:eq-nrmlzd.xml -im:mml2tex-preprocess
In our experience of working with poorly formatted equations, this XSLT improved the markup plenty of times. However, due to the nature of heuristic methods, there can be cases where this XSLT is no help at all or changes something for the worse. We suggest to include Schematron checking and a certain level of math proofreading in your process.