[LibreOffice][svg import filter] handling text

Thu May 3 13:17:44 PDT 2012

Hi everyone.

Some thoughts on the current svg text import implementation.

(1)
The ShapeWritingVisitor does not handle <tspan> element.
In fact only <text> element are mapped to XML_TEXT id,
the id for <tspan> element is XML_TSPAN and no such case
is present in the ShapeWritingVisitor::operator().

(2)
Only 'x' and 'y' attributes whose value is a single coordinate
are handled, whilst the value of such attributes can be a list
of coordinates where the n-th coordinate pair represents the
position at which the n-th character included in the given <text>
or <tspan> element has to be placed.
Reference: http://www.w3.org/TR/SVG11/text.html#TextElementXAttribute

A draft of a *possible* solution:

(1)
Implement an ad-hoc visitor to be applied to the svg DOM tree
before any other visitor in order to "normalize" text elements.
After normalization a <text> or <tspan> element that owns
a TEXT_NODE (that is an inter-tag character sequence) does not
own any ELEMENT_NODE.
So for example:
       <text>svg<tspan>import</tspan>filter</text>
should be transformed in:
       <text>
	<tspan>svg</tspan>
	<tspan>import</tspan>
	<tspan>filter</tspan>
       </text>

and:
       <text x="10, 20, 30" y="5, 15">HELLO<\text>
should be transformed in:
       <text>
       	<tspan x="10" y="5">H</tspan>
	<tspan x="20" y="15">E</tspan>
	<tspan x="30">LLO</tspan>
       <\text>

(2)
Add to the AnnotateVisitor two new properties:
mnTextCurrentXPos, mnTextCurrentYPos.
After setting up all style properties the AnnotateVisitor
should perform something like the following pseudo-code.

if( Element is <text> or <tspan> )
{
     if( Element has 'x' attribute )
	mnTextCurrentXPos = value of 'x';

     if( Element has TEXT_NODE )
     {
	// text elements that does not have a TEXT_NODE are just
	// container providing style we do not need to handle them
	// further

	// the 'x' attribute will be added if not present
	set the value of the 'x' attribute to mnTextCurrentXPos;

	aText = extract text from Element;
         // compute the text width using
	// the current text style
	width = computeTextWidth( aText, aCurrentState )
	mnTextCurrentXPos += width;
     }

     // do the same for the y attribute
}

Moreover each time a root text element starts
the value of mnTextCurrentXPos and mnTextCurrentYPos
should be reset to zero.

The above implementation follows what all browsers at present
do for rendering svg text: that is if a <tspan> element does
not specify an 'x' attribute the current text position is used,
that is the last seen 'x' attribute not the parent one.
Indeed the standard says something different:

<< If the attribute is not specified: (a) if an ancestor ‘text’
or ‘tspan’ element specifies an absolute X coordinate for
a given character via an ‘x’ attribute, then that absolute X
coordinate is used (nearest ancestor has precedence),
else (b) the starting X coordinate for rendering the glyphs
corresponding to a given character is the X coordinate of
the resulting current text position from the most recently
rendered glyph for the current ‘text’ element.>>

Computation of text width and height should take into account
the value of text style attributes.
The real problem is how to perform such computations ?

Note that in order to not make things even more complex I have
ignored dx, dy, and rotate attributes and transformations too.

(3)
XML_TEXT and XML_TSPAN should be handled by the same case:
if the element owns a TEXT_NODE (and so no ELEMENT_NODE after
normalization) the text is extracted and a odf text element
is created;
in case the element has only ELEMENT_NODEs (that is one or more
<tspan>) it should be handled as a <g> element.
The visitElements routine will be responsible for iterating
on children (<tspan> elements).

Well for sure it lacks a lot of details and I have not taken
into account several issues, anyway I think it can be regarded
as a start point.

Cheers,
-- Marco