Recent improvements to the MathML export

Frédéric WANG fred.wang at free.fr
Thu Jul 4 08:31:57 PDT 2013


Hi all,

As announced earlier on this mailing list, I've recently been submitting 
patches, mostly to improve the MathML export. A couple of things remain 
to be reviewed/merged but after that I think LibreOffice MathML export 
will be reasonably good (although of course some MathML experts will say 
it can be still improved). At least, all the mathematical formatting 
done via the StarMath language should now be preserved when exported to 
MathML and Firefox and MathJax should be able to render the MathML 
correctly. I’ve also fixed some bugs with the XHTML export when the 
document contains mathematical formulas.

In order to test the new MathML export, to verify how other rendering 
engines could handle the MathML markup and to compare visual 
improvements, I’ve written a small ODT document that contains an 
overview of LibreOffice Math features:

http://www.maths-informatique-jeux.com/international/LibreOfficeMath.zip

The archive contains two directories with the results for LibreOffice 
4.0 (the release version installed on my system) and LibreOffice 4.2 
(with all my patches applied). Each directory contains the ODT file and 
the exported XHTML page. You can open that page with your browser and 
there is also a version that uses MathJax. I’ve included pdf documents 
showing the rendering with LibreOffice, Gecko and MathJax. I’ve also 
extracted the MathML files so that you can compare the code or verify 
the rendering in other engines. For example try "for f in `ls 
4.2/MathML/*`; do mathmlviewer $f; done" to display all the MathML files 
in gtkmathview.

For the record, here is a (certainly non-exhaustive) list of things to 
improve:

1) expressions like "1+3x" are still incorrectly interpreted as "{1+3}x" 
(bug 66200). Well, I guess that won't affect the rendering but that 
might confuse accessibility tools.

2) arbitrary Unicode characters are generally interpreted as 
identifiers. If you want to define an operator, you must use the boper, 
uoper or oper commands. You can also create your own %xxx command and 
the MathML export will use the MathML operator dictionary to guess if 
that's an operator.

3) As I read the source code, Math does not seem to handle non-BMP 
characters (probably the reason for bug 66333)

4) For vec and widevec, the combining character "U+20D7 COMBINING RIGHT 
ARROW ABOVE" is used instead of "U+2192 RIGHTWARDS ARROW". The latter is 
recommended by the MathML spec and may renders better in MathML 
rendering engines but the former renders better in LibreOffice Math at 
the moment. Perhaps this could be changed when stretching is implemented 
correctly (cf bug 32362 comment 21)

5) The interpretation of alignment in Math is weird. Concretely, if you 
write "matrix{alignl blah blah blah blah blah blah ## alignl 3 over 10 + 
7 over 10 = 1 }" to align the left side of the rows, this applies 
recursively and the numerator/denominator of the fractions will be 
aligned left too. In the MathML export, I only apply the alignment to 
the specified node, not to descendants. Better alignment was one of the 
four most missing issues listed by Thomas Lange.

6) The MathML export does not take into account the properties from the 
Format menu (except "Text mode"). Font and alignment properties could 
probably be handled in MathML. Other general spacing rules would be less 
accurate and reliable to do with MathML spacing elements and it's better 
to use the information from the Open Type MATH table for that, anyway.

7) The MathML import is not good and I suspect it is unlikely to be 
successfully used to import MathML formulas generated by third-party 
tools. LibreOffice Math is even not always able to import correctly the 
MathML it generates (when you remove the StarMath annotation). I added 
new MathML constructions in the export so rendering properties are now 
preserved at export but are still lost when importing it back (again, 
when you remove the StarMath annotation). I don't plan to improve this, 
especially if people always keep the StarMath annotation and if we move 
to MathML as the reference format in the future.

8) I added new MathML constructions that are from MathML2 and MathML3. I 
guess the menu should use the generic term "MathML" instead of 
specifying the obsolete version "MathML 1.0".

9) I've proposed two enhancements: an option to use MathJax (bug 66287) 
and a HTML5 export filter (bug 66044). I've submitted a WIP patch for 
the first and I suspect the second could reuse the existing XHTML 
filter. But I don't know where the code to call these XSLT style sheets 
and handle the dialogs is located.

10) As Khaled Hosny mentioned in a previous message, inserting a math 
object in Word is not really convenient. Also, setting inline/display 
mode by using Format => Text is a bit tedious. In general, integration 
of formulas in the surrounding text (line breaking, spacing, alignment) 
is not really good. Also, this is what the other missing issues listed 
by Thomas Lange were about.

For completeness and for people interested in details, here is the list 
of bugs I've worked on:

- Variable with coefficients not italicized 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=55853>

- Wide accents are not stretchy when exported to MathML 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66024>

- Blanks are not correctly exported to MathML 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66075>

- Improve grouping of binary operators 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66081>

- Commands wideslash, widebslash and overstrike are not correctly 
exported to MathML 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66086>

- Many mathematical symbols are incorrectly exported as <mo> operators 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66088>

- MathML export: avoid using combining characters for accents and 
diacritical marks 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66276>

- Use columnalign to implement matrix alignment 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66277>

- MathML export does not distinguish between inline and display 
equations <https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66278>

- MathML export: use the operator dictionary 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66279>

- underbrace and overbrace are not stretchy when exported to MathML 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66281>

- Replace <mfenced> elements by equivalent <mrow>+<mo> constructions 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66282>

- phantom and stylistic commands generate useless elements 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66283>

- Incorrect unicode characters used in the "Brackets" 
<https://www.libreoffice.org/bugzilla/show_bug.cgi?id=66416>

- Incorrect removal of last line in SmXMLExport::ExportTable 
<https://bugs.freedesktop.org/show_bug.cgi?id=66575>

So LibreOffice users will hopefully be able to generate better Web pages 
with MathML and I’ll be more confident to recommend LibreOffice when 
people ask for a WYSIWYG editor on the MathJax mailing list!

Thanks,

-- 
Frédéric Wang
maths-informatique-jeux.com/blog/frederic



More information about the LibreOffice mailing list