Guidance on 'Paragraph Tab' bug

Mon Jul 1 01:43:42 PDT 2013

Hi Adam,

On Thu, Jun 27, 2013 at 06:18:09PM +0300, Adam Fyne <Adam.Fyne at cloudon.com> wrote:
> I didn't post this on the IRC because it is too long and too specific, and
> I feel it will be lost there…

Sure, for some kind of discussions the mailing list is a better place.

> I want to fix a bug with import \ export of a 'Paragraph Tab'.
> 
> I've attached a really simple DOCX with such a paragraph tab.
> 
> The XML node is 'w:ptab' inside a 'run' node.

I see. Indeed, looks like this is not imported (correctly).

> When it goes through Writer – it is transformed to a simple tab.
> 
> I would like to fix this so that the 'ptab' is:
> 
> 1.       Import 'ptab' from DOCX
> 
> 2.       Store the 'ptab' attributes in the Writer's core
> 
> 3.       Render correctly on the screen (2nd run will be aligned to the
> right)
> 
> 4.       Export 'ptab' back to DOCX

Hmm, this sounds like a new feature -- doing that would be great, but I
would suggest to finish your previous feature first (the character
shading one), where the ODF filters are not yet updated.

> After doing some digging, I found this in 'model.xml':
> 
>    22530  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22530>
>    <resource *name=*"CT_PTab" *resource=*"Stream" *tag=*"paragraph">
> 
>    22531  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22531>
>      <attribute *name=*"alignment"
> *tokenid=*"ooxml:CT_PTab_alignment"/>
> 
>    22532  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22532>
>      <attribute *name=*"relativeTo"
> *tokenid=*"ooxml:CT_PTab_relativeTo"/>
> 
>    22533  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22533>
>      <attribute *name=*"leader" *tokenid=*"ooxml:CT_PTab_leader"/>
> 
>    22534  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22534>
>      <action *name=*"end" *action=*"tab"/>
> 
>    22535  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22535>
>    </resource>
> 
> 
> 
> And also found this:
> 
>    22574  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22574>
>    <resource *name=*"CT_Tab" *resource=*"Stream" *tag=*"content">
> 
>    22575  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22575>
>      <action *name=*"end" *action=*"tab"/>
> 
>    22576  <http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22576>
>    </resource>
> 
> 
> 
> I have a few questions:
> 
> 1.       Shouldn't "CT_PTab" call "ptab" instead of "tab"?

That's right, except that writerfilter::ooxml::OOXMLFastContextHandler
has a tab() method, but no ptab() method, that will be one thing you
need to implement first.

> 2.       What is the meaning of the 'tag' attribute of the 'resource' node?

As far as I know, the <action .. action="name"/> is always a method
call.

> 3.       The way information is stored in 'model.xml' is so confusing.

You're not alone, writerfilter/documentation/ooxml/model.xml is what we
found out so far, feel free to extend that if you manage to decode some
more detail.

In short, whenever you add support for new XML tags, you typically need
to extend the file at two places:

- the new tag is a child of some existing tag, so extend the parent's
  definition
- you also need to add a matching <resource> tag in model.xml

Once those two definitions match, you get new tokens in dmapper.
363dafefad14411a16f6ea9d2ee0d55b67bc9c8d is hopefully a good example.
(Though your case is easier, as you add a new token in an existing
namespace.)

> Some of the info is stored like this (resource + attributes + action),
> some are stored as 'define' + 'attribute' + 'ref',
> some are stored as 'resource' + 'value's.
> This is more of a general question, but – what is the difference between
> these nodes?

First probably it makes sense to see how RELAX NG works, e.g. have a
look at the RELAX NG definition of the ODF format. ref/define is just a
way to avoid copy&paste, you define something first, then you can refer
to it (by name, using "ref") multiple times. If I'm not mistaken, the
only non-RELAX NG tag you need in model.xml is the <resource> one, as
explained above.

> From the code – I understood that 'action' calls a function in
> "OOXMLFastContextHandler".
> 
> When do we need such actions? Why is this done on some nodes and on other
> nodes (like 'run', 'paragraph', 'brush' etc) not done?
> 
> 
> So – say I need to add a new function called 'ptab' to
> 'OOXMLFastContextHandler' – Do I simply copy the logic of 'tab()' ?

I think it's all about where do you want to handle the input. Normally,
the tokenizer just generates these tokens, and dmapper does the mapping.

However, in case of tabs, other (RTF, WW8) formats handle the tab as a
normal character, so in case of DOCX, an action is used, that converts
the OOXML tokens to a simple character, so in dmapper you always get a
tab character. So actions are used to generate these "fake tokens".
Other example: w:hyperlink is also handled in the tokenizer, and it
generates a HYPERLINK field from it, and dmapper handles only that.

> What does the 'utext' function do?

Apart from logging, see
writerfilter::dmapper::DomainMapper::lcl_utext(). That's where dmapper
recieves all the unicode text input.

> Where do I parse the attributes themselves of the 'ptab'?

If you handle ptab as a normal element in model.xml, you'll have the
usual way to get all its attributes. I would recommend going that way,
as ptab is not a character (tab is), but an element with attributes.

> So I hope after I read your advice from this email – I will implement the
> 'DOCX importer' for the 'ptab'.
> 
> Should I then create a *new* core object for the 'Paragraph Tab' or should
> I add it as properties to some existing object of the core?

I would check how existing similar features are implement, and do
something similar. Normal tabs are not a good example, as those are
stored as a \t character inside SwTxtNode, but page break may be a good
example.

> This email is too long, so I won't burden you now with 'rendering' and
> 'exporter' questions…

Sure, so -- as usual, the first step would be to design how the document
model should store these paragraph tabs, then either do the UNO API or
some UI, so you can test it. Then you can continue with filters and
layout, etc.

Hope this helps,

Miklos
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Paragraph Tab.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 10239 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20130701/158c2ca1/attachment.docx>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20130701/158c2ca1/attachment.pgp>