[GSOC] ODS Styles Import Performance Work Update

Kohei Yoshida kohei.yoshida at gmail.com
Fri Aug 17 17:20:42 PDT 2012


Hi Daniel,

On Fri, Aug 17, 2012 at 5:50 PM, Daniel Bankston
<daniel.dev.libreoffice at gmail.com> wrote:
> Hi, Kohei,
>
> From what I understand, ScXMLImport::SetStyleToRanges was being called for
> each row.  For each of it's calls, the result was multiple calls to where
> the time was being spent in ScMarkArray::SetMarkArea and
> ScMarkArray::Search.  This is from the call chain of
> ScXMLImport::SetStyleToRanges's use of
> XMLTabelStyleContext::FillPropertySet.
>
> My first goal was to try to reduce the number of
> XMLTabelStyleContext::FillPropertySet calls by only calling it if the
> current style name was different from the previous style name. This way, if
> several rows have the same style, we can just keep saving the ranges and
> then make one call to apply styles for all of those rows.
>
> Unfortunately, that didn't improve performance.  As far as I can tell, it
> just pushed the amount of time that would have been spent on each row into
> one call that took approximately the sum of all of the rows' times.
>
> Ok, I admit I haven't tried to figure out the inner workings of ScMarkData
> and ScMarkArray, but I decided maybe changing what methods we use from those
> classes would help. ScCellRangesBase::GetMarkData uses
> ScMarkData::MarkRangeList where we iterate through an ScRangeList that
> ScCellRangesBase maintains. Each iteration causes multiple calls to
> ScMarkArray::SetMarkArea and Search.
>
> So I used ScRangeList::Combine to create an ScRange and use that range with
> ScMarkData::SetMarkArea.  This combined with my first goal did improve
> import time by 2 or 3 seconds, but I was expecting much more!
>
> With these code changes, callgrind output reveals to me that we are no
> longer spending lots of time in ScMarkData/Array methods (or ScRangeList for
> that matter).  Although we were already spending time in malloc's and
> free's, after these changes, we seem to spend just a little bit more time
> there now.
>
> So I wonder if this means that styles wasn't necessary this file's biggest
> slow down.

Well, the styles are still a significant cause of poor performance.
Maybe not that much with this particular document, since this document
only has a bunch of raw string cells with not much styles.  But even
then, the time it spends on importing styles is not insignificant.

  No doubt that converting sc style UNO calls would help
> performance at least a bit.

Yes.  That should certainly help, hopefully more than just "a bit".
:-)  Also note that, removing the UNO calls would also remove the use
of ScMarkData.  ScMarkData is basically a view data, to store current
range selections.  To me it makes no sense to use this when importing
a document.

The biggest win with removing the UNO calls (and also the ScDocFunc
calls) is that doing that would lay down the foundation for further
performance tweak.  By removing these layers and exposing ScDocument
to the XML import code, we could then further optimize style settings
directly to ScDocument.  That would be much easier to do than trying
to do the same thing via UNO layer.

Interesting thing is, we used to have similar performance problem with
styles import with Excel files.  And I did take a very similar
approach to what you just tried.  The difference is that, I set the
styles directly to ScDocument using the same data structure that's
used in ScAttrArray to store the styles attributes, to avoid
unnecessary conversation of data, and also unnecessary broadcasting
etc that it would normally do when modifying its internal content.
That resulted in a pretty big performance improvement.

You can take a look at ScDocument::SetAttrEntries and its call chain
all the way down to ScAttrArray.

So, the approach I would aim for is to change the XML import code to
use ScDocument::SetAttrEntries to set the styles directly, then see if
that would improve the performance.  I'm pretty sure that, as long as
we keep using the UNO layer to set properties there isn't much we can
do to improve performance further.  We need to break that layer first.

I wonder if the biggest problem now for a file
> like this is xmloff because it's methods are all pretty high up there still
> as far as callgrind time is concerned.   (Strings, lookups, etc.)

Yup. The string allocations during parsing unfortunately plays a large
part of the overall poor import performance. But reworking that would
require reworking the whole XML import code stack, which would require
a bit more effort than we have time for GSOC, to say the least...

Anyway, don't feel too bad about it.  Sometimes the perf work goes
like that, and I have had my share of disappointment over the years.
This styles import code is especially a tough nut to crack....

Best,

Kohei


More information about the LibreOffice mailing list