Recent improvements to the MathML export
Frédéric WANG
fred.wang at free.fr
Thu Jul 4 08:31:57 PDT 2013
Hi all,
As announced earlier on this mailing list, I've recently been submitting
patches, mostly to improve the MathML export. A couple of things remain
to be reviewed/merged but after that I think LibreOffice MathML export
will be reasonably good (although of course some MathML experts will say
it can be still improved). At least, all the mathematical formatting
done via the StarMath language should now be preserved when exported to
MathML and Firefox and MathJax should be able to render the MathML
correctly. I’ve also fixed some bugs with the XHTML export when the
document contains mathematical formulas.
In order to test the new MathML export, to verify how other rendering
engines could handle the MathML markup and to compare visual
improvements, I’ve written a small ODT document that contains an
overview of LibreOffice Math features:
http://www.maths-informatique-jeux.com/international/LibreOfficeMath.zip
The archive contains two directories with the results for LibreOffice
4.0 (the release version installed on my system) and LibreOffice 4.2
(with all my patches applied). Each directory contains the ODT file and
the exported XHTML page. You can open that page with your browser and
there is also a version that uses MathJax. I’ve included pdf documents
showing the rendering with LibreOffice, Gecko and MathJax. I’ve also
extracted the MathML files so that you can compare the code or verify
the rendering in other engines. For example try "for f in `ls
4.2/MathML/*`; do mathmlviewer $f; done" to display all the MathML files
in gtkmathview.
For the record, here is a (certainly non-exhaustive) list of things to
improve:
1) expressions like "1+3x" are still incorrectly interpreted as "{1+3}x"
(bug 66200). Well, I guess that won't affect the rendering but that
might confuse accessibility tools.
2) arbitrary Unicode characters are generally interpreted as
identifiers. If you want to define an operator, you must use the boper,
uoper or oper commands. You can also create your own %xxx command and
the MathML export will use the MathML operator dictionary to guess if
that's an operator.
3) As I read the source code, Math does not seem to handle non-BMP
characters (probably the reason for bug 66333)
4) For vec and widevec, the combining character "U+20D7 COMBINING RIGHT
ARROW ABOVE" is used instead of "U+2192 RIGHTWARDS ARROW". The latter is
recommended by the MathML spec and may renders better in MathML
rendering engines but the former renders better in LibreOffice Math at
the moment. Perhaps this could be changed when stretching is implemented
correctly (cf bug 32362 comment 21)
5) The interpretation of alignment in Math is weird. Concretely, if you
write "matrix{alignl blah blah blah blah blah blah ## alignl 3 over 10 +
7 over 10 = 1 }" to align the left side of the rows, this applies
recursively and the numerator/denominator of the fractions will be
aligned left too. In the MathML export, I only apply the alignment to
the specified node, not to descendants. Better alignment was one of the
four most missing issues listed by Thomas Lange.
6) The MathML export does not take into account the properties from the
Format menu (except "Text mode"). Font and alignment properties could
probably be handled in MathML. Other general spacing rules would be less
accurate and reliable to do with MathML spacing elements and it's better
to use the information from the Open Type MATH table for that, anyway.
7) The MathML import is not good and I suspect it is unlikely to be
successfully used to import MathML formulas generated by third-party
tools. LibreOffice Math is even not always able to import correctly the
MathML it generates (when you remove the StarMath annotation). I added
new MathML constructions in the export so rendering properties are now
preserved at export but are still lost when importing it back (again,
when you remove the StarMath annotation). I don't plan to improve this,
especially if people always keep the StarMath annotation and if we move
to MathML as the reference format in the future.
8) I added new MathML constructions that are from MathML2 and MathML3. I
guess the menu should use the generic term "MathML" instead of
specifying the obsolete version "MathML 1.0".
9) I've proposed two enhancements: an option to use MathJax (bug 66287)
and a HTML5 export filter (bug 66044). I've submitted a WIP patch for
the first and I suspect the second could reuse the existing XHTML
filter. But I don't know where the code to call these XSLT style sheets
and handle the dialogs is located.
10) As Khaled Hosny mentioned in a previous message, inserting a math
object in Word is not really convenient. Also, setting inline/display
mode by using Format => Text is a bit tedious. In general, integration
of formulas in the surrounding text (line breaking, spacing, alignment)
is not really good. Also, this is what the other missing issues listed
by Thomas Lange were about.
For completeness and for people interested in details, here is the list
of bugs I've worked on:
- Variable with coefficients not italicized
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=55853>
- Wide accents are not stretchy when exported to MathML
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66024>
- Blanks are not correctly exported to MathML
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66075>
- Improve grouping of binary operators
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66081>
- Commands wideslash, widebslash and overstrike are not correctly
exported to MathML
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66086>
- Many mathematical symbols are incorrectly exported as <mo> operators
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66088>
- MathML export: avoid using combining characters for accents and
diacritical marks
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66276>
- Use columnalign to implement matrix alignment
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66277>
- MathML export does not distinguish between inline and display
equations <https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66278>
- MathML export: use the operator dictionary
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66279>
- underbrace and overbrace are not stretchy when exported to MathML
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66281>
- Replace <mfenced> elements by equivalent <mrow>+<mo> constructions
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66282>
- phantom and stylistic commands generate useless elements
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66283>
- Incorrect unicode characters used in the "Brackets"
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66416>
- Incorrect removal of last line in SmXMLExport::ExportTable
<https://bugs.freedesktop.org/show_bug.cgi?id=66575>
So LibreOffice users will hopefully be able to generate better Web pages
with MathML and I’ll be more confident to recommend LibreOffice when
people ask for a WYSIWYG editor on the MathJax mailing list!
Thanks,
--
Frédéric Wang
maths-informatique-jeux.com/blog/frederic
More information about the LibreOffice
mailing list