Guidance on 'Paragraph Tab' bug

Mon Aug 5 07:27:50 PDT 2013

Hi,

Thanks for the pointers.

I've now come around to start working on this.

After analyzing - I've noticed that normal 'tabs' in Word work like this:

You have a '<w:tabs>' node that holds inside a series of 'w:tab' nodes that
give information about TabStops.

This is translated to an 'SvxTabStopItem' that holds a list of
'TabStopItem' objects.

Each of them has different properties than what I need for the new
'SvxPositionalTabItem' class that I need.

So I've decided to create a new class that holds the properties.

I have a few rather simple questions:

*1.       *I see that some classes in '/editeng/source/items' have a
'QueryValue' and 'PutValue' and some don't.
What is the logic for deciding if an 'SfxPoolItem' should or should not
implement these 2 functions?
AFAIK - these 2 functions are mandatory if you want be able to use UNO
mapping with these classes, no?
So how come there are classes that don't implement these 2 functions?

*2.       *I see that there is a function called 'GetPresentation'.
I am not sure what is the logic in the function and how I am supposed to
implement it for a 'Positional Tab'.
Can someone shed some light on this?

Once I get the 'SvxPositionalTabItem' nailed down - what would be the
obvious next step to do ?

Add some UNO mapping logic ?

Add rendering logic for the positional tab item ?

Add import \ export from DOCX ? (that’s where it all started)

Best,

       Adam Fyne

-----Original Message-----
From: Miklos Vajna [mailto:vmiklos at suse.cz]
Sent: Monday, July 1, 2013 11:44 AM
To: Adam Fyne
Cc: libreoffice at lists.freedesktop.org
Subject: Re: Guidance on 'Paragraph Tab' bug

Hi Adam,

On Thu, Jun 27, 2013 at 06:18:09PM +0300, Adam Fyne <Adam.Fyne at cloudon.com>
wrote:

> I didn't post this on the IRC because it is too long and too specific,

> and I feel it will be lost there…

Sure, for some kind of discussions the mailing list is a better place.

> I want to fix a bug with import \ export of a 'Paragraph Tab'.

>

> I've attached a really simple DOCX with such a paragraph tab.

>

> The XML node is 'w:ptab' inside a 'run' node.

I see. Indeed, looks like this is not imported (correctly).

> When it goes through Writer – it is transformed to a simple tab.

>

> I would like to fix this so that the 'ptab' is:

>

> 1.       Import 'ptab' from DOCX

>

> 2.       Store the 'ptab' attributes in the Writer's core

>

> 3.       Render correctly on the screen (2nd run will be aligned to the

> right)

>

> 4.       Export 'ptab' back to DOCX

Hmm, this sounds like a new feature -- doing that would be great, but I
would suggest to finish your previous feature first (the character shading
one), where the ODF filters are not yet updated.

> After doing some digging, I found this in 'model.xml':

>

>    22530  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22530
>

>    <resource *name=*"CT_PTab" *resource=*"Stream" *tag=*"paragraph">

>

>    22531  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22531
>

>      <attribute *name=*"alignment"

> *tokenid=*"ooxml:CT_PTab_alignment"/>

>

>    22532  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22532
>

>      <attribute *name=*"relativeTo"

> *tokenid=*"ooxml:CT_PTab_relativeTo"/>

>

>    22533  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22533
>

>      <attribute *name=*"leader" *tokenid=*"ooxml:CT_PTab_leader"/>

>

>    22534  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22534
>

>      <action *name=*"end" *action=*"tab"/>

>

>    22535  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22535
>

>    </resource>

>

>

>

> And also found this:

>

>    22574  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22574
>

>    <resource *name=*"CT_Tab" *resource=*"Stream" *tag=*"content">

>

>    22575  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22575
>

>      <action *name=*"end" *action=*"tab"/>

>

>    22576  <
http://opengrok.libreoffice.org/xref/core/writerfilter/source/ooxml/model.xml#22576
>

>    </resource>

>

>

>

> I have a few questions:

>

> 1.       Shouldn't "CT_PTab" call "ptab" instead of "tab"?

That's right, except that writerfilter::ooxml::OOXMLFastContextHandler

has a tab() method, but no ptab() method, that will be one thing you need
to implement first.

> 2.       What is the meaning of the 'tag' attribute of the 'resource'
node?

As far as I know, the <action .. action="name"/> is always a method call.

> 3.       The way information is stored in 'model.xml' is so confusing.

You're not alone, writerfilter/documentation/ooxml/model.xml is what we
found out so far, feel free to extend that if you manage to decode some
more detail.

In short, whenever you add support for new XML tags, you typically need to
extend the file at two places:

- the new tag is a child of some existing tag, so extend the parent's

  definition

- you also need to add a matching <resource> tag in model.xml

Once those two definitions match, you get new tokens in dmapper.

363dafefad14411a16f6ea9d2ee0d55b67bc9c8d is hopefully a good example.

(Though your case is easier, as you add a new token in an existing

namespace.)

> Some of the info is stored like this (resource + attributes + action),

> some are stored as 'define' + 'attribute' + 'ref', some are stored as

> 'resource' + 'value's.

> This is more of a general question, but – what is the difference

> between these nodes?

First probably it makes sense to see how RELAX NG works, e.g. have a look
at the RELAX NG definition of the ODF format. ref/define is just a way to
avoid copy&paste, you define something first, then you can refer to it (by
name, using "ref") multiple times. If I'm not mistaken, the only non-RELAX
NG tag you need in model.xml is the <resource> one, as explained above.

> From the code – I understood that 'action' calls a function in

> "OOXMLFastContextHandler".

>

> When do we need such actions? Why is this done on some nodes and on

> other nodes (like 'run', 'paragraph', 'brush' etc) not done?

>

>

> So – say I need to add a new function called 'ptab' to

> 'OOXMLFastContextHandler' – Do I simply copy the logic of 'tab()' ?

I think it's all about where do you want to handle the input. Normally, the
tokenizer just generates these tokens, and dmapper does the mapping.

However, in case of tabs, other (RTF, WW8) formats handle the tab as a
normal character, so in case of DOCX, an action is used, that converts the
OOXML tokens to a simple character, so in dmapper you always get a tab
character. So actions are used to generate these "fake tokens".

Other example: w:hyperlink is also handled in the tokenizer, and it
generates a HYPERLINK field from it, and dmapper handles only that.

> What does the 'utext' function do?

Apart from logging, see

writerfilter::dmapper::DomainMapper::lcl_utext(). That's where dmapper
recieves all the unicode text input.

> Where do I parse the attributes themselves of the 'ptab'?

If you handle ptab as a normal element in model.xml, you'll have the usual
way to get all its attributes. I would recommend going that way, as ptab is
not a character (tab is), but an element with attributes.

> So I hope after I read your advice from this email – I will implement

> the 'DOCX importer' for the 'ptab'.

>

> Should I then create a *new* core object for the 'Paragraph Tab' or

> should I add it as properties to some existing object of the core?

I would check how existing similar features are implement, and do something
similar. Normal tabs are not a good example, as those are stored as a \t
character inside SwTxtNode, but page break may be a good example.

> This email is too long, so I won't burden you now with 'rendering' and

> 'exporter' questions…

Sure, so -- as usual, the first step would be to design how the document
model should store these paragraph tabs, then either do the UNO API or some
UI, so you can test it. Then you can continue with filters and layout, etc.

Hope this helps,

Miklos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20130805/fee342cd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 60204 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20130805/fee342cd/attachment.jpg>