[GSOC] Using cached formula results during ODS import

Daniel Bankston daniel.dev.libreoffice at gmail.com
Tue Jul 3 14:46:12 PDT 2012

Hi, Kohei and Markus,

I have commited my initial implementation of using imported formula 
results during ODS import if the document was generated by LibreOffice. 
It's not as clean and symmetric as I would have liked, but it's the best 
I have right now. I hope it's not too hackish, ;-) and I await your 
valuable comments and criticisms.

If an ScFormulaCell is set to dirty, it will be re-Interpreted 
(recalculated). The goal is to avoid this during ODS import of a 
LibreOffice generated document and instead use cached formula results 
imported from the document itself to achieve better performance.

I added a new flag in ScDocument (kind of like IsImportingXML) that is 
set if the document is currently importing a LibreOffice ODS generated 
document. In this email, I'll call it the ScDocument 
libreoffice-generated-doc flag. The flag is set in 
ScXMLImport::setTargetDocument() because I needed a valid ScDocument to 
have already been created and I needed access to the XML meta generator. 
The flag is always reset in ScDocShell::LoadXML() because 
ScDocShell::AfterXMLLoading() will try to set ScFormulaCells to dirty.

If the ScDocument libreoffice-generated-doc flag is set, 
ScXMLTableRowCellContext::EndElement()'s callees will set each newly 
created ScFormulaCell to not dirty (a newly created ScFormulaCell is set 
to dirty by default).

There are multiple times during the import process that all 
ScFormulaCells are attempted to be set to dirty. To prevent this, 
ScFormulaCell::SetDirtyVar() will not set a ScFormulaCell to dirty if 
the ScDocument libreoffice-generated-doc flag is set.

I still need to implement checking for special cases of functions, such 
as NOW(), where we always want to recalculate the formula result.

Here are import times before and after my commit 
7a0fba0b0225f59f8c38b245cb21b81750271e26, using Markus's large matrix 
test file with complicated functions (single sheet with approx 300cols x 
5200rows), on an –enable-symbols libo build on a machine with AMD Athlon 
64 X2 Dual Core 6400 @ 3.2GHz with 8GB Ram running 64 bit GNU/Linux:

Before commit: 22 seconds
After commit: 21 seconds

Really it's probably a less than a second improvement which is not so great.

Two areas that I think MAY help improve performance a little:

1) Since the goal was to just stop Interpret() from being called, my 
current implementation allows the whole SetDirty() call chain to call 
all the way down to the lowest level, which is 
ScFormulaCell::SetDirtyVar(), before preventing the ScFormulaCell 
object's dirty flag from being set. Performance may improve a bit, if I 
stop this call chain at a higher level, preventing the overhead of the 
calls and peripheral logic taking place in the call chain. I just have 
to make sure that this peripheral logic isn't required.

2) ScXMLImport::setTargetDocument() is called for each tab in the 
document which means the meta generator is checked and the ScDocument 
libreoffice-generated-doc flag is set for each tab. Maybe I can find a 
way to do this once while still having access to the XML and a valid 
ScDocument. However, this probably won't affect performance much at all 
especially in the test document I was using.

I'll try to see if callgrind will tell me anything with this test document.

Daniel Bankston

More information about the LibreOffice mailing list