[Libreoffice-commits] core.git: sw/inc sw/source

Michael Stahl mstahl at redhat.com
Fri Aug 18 10:32:11 UTC 2017


On 17.08.2017 17:10, Ashod Nakashian wrote:
> Hi Thorsten,
> 
> On Wed, Aug 16, 2017 at 5:22 AM, Thorsten Behrens <thb at libreoffice.org
> <mailto:thb at libreoffice.org>> wrote:
> 
>     Miklos Vajna wrote:
>     > The idea is that per-paragraph signature should be non-chained, similar
>     > to per-document signatures, so the Writer field(s) representing the
>     > signature(s) should be filtered out before hashing, but otherwise this
>     > just takes the paragraph text as-is. (My understanding is that ODF
>     > specifies what is the exact paragraph string for a <text:p> element.)
>     >
>     Hi Miklos,
> 
>     ok - as long as that could be described (or pseudo code given),
>     that'll do I guess. Just be aware that text:p can still be quite
>     complex in xml, with whitespace mangling & all sorts of child elements
>     (see paragraph-content-or-hyperlink / paragraph-content in the
>     schema).
> 
> 
> The code currently in master was a temporary first step. The logic I
> currently have locally ready to push soon is to only use Text portions. 
> 
> Roughly as follows:
> 
>   OUStringBuffer strBuf;
>   for (auto& portion : paragraphTextPortions) {
>       if (portion.TextPortionType == "Text")
>           strBuf.append(portion.Text);
>   }
>   sign(strBuf.makeStringAndClear());
> 
> I expect this should exclude any unwanted fields/characters/LO-specific
> conversions etc.
> 
> Let me know if there are concerns with this approach.

there are some other portions that, depending on what you want to do,
could be interpreted as containing text:

* "TextField" "generates" text
* "Frame" references paragraphs which contain text
* "Footnote" references paragraphs which contain text
* "InContentMetadata" contains text that is in the paragraph, but you
  have to recursively enumerate its text portions to get at it, it's not
  in the paragraph's enumeration
  (your use case makes me regret that choice of API representation)
* "TextField" may be a "MetadataField" which doesn't generate text but
  has to be recursively enumerated just like "InContentMetadata"
* "Annotation" references (editengine) paragraphs which contain text

there are various other functions to get "cleaned up" text from a
paragraph, such as SwTextNode::GetExpandText() and class
ModelToViewHelper but i'm not even sure why there are several different
ones and when to use which one.



More information about the LibreOffice mailing list