[GSOC] Using cached formula results during ODS import
daniel.dev.libreoffice at gmail.com
Tue Jul 3 14:46:12 PDT 2012
Hi, Kohei and Markus,
I have commited my initial implementation of using imported formula
results during ODS import if the document was generated by LibreOffice.
It's not as clean and symmetric as I would have liked, but it's the best
I have right now. I hope it's not too hackish, ;-) and I await your
valuable comments and criticisms.
If an ScFormulaCell is set to dirty, it will be re-Interpreted
(recalculated). The goal is to avoid this during ODS import of a
LibreOffice generated document and instead use cached formula results
imported from the document itself to achieve better performance.
I added a new flag in ScDocument (kind of like IsImportingXML) that is
set if the document is currently importing a LibreOffice ODS generated
document. In this email, I'll call it the ScDocument
libreoffice-generated-doc flag. The flag is set in
ScXMLImport::setTargetDocument() because I needed a valid ScDocument to
have already been created and I needed access to the XML meta generator.
The flag is always reset in ScDocShell::LoadXML() because
ScDocShell::AfterXMLLoading() will try to set ScFormulaCells to dirty.
If the ScDocument libreoffice-generated-doc flag is set,
ScXMLTableRowCellContext::EndElement()'s callees will set each newly
created ScFormulaCell to not dirty (a newly created ScFormulaCell is set
to dirty by default).
There are multiple times during the import process that all
ScFormulaCells are attempted to be set to dirty. To prevent this,
ScFormulaCell::SetDirtyVar() will not set a ScFormulaCell to dirty if
the ScDocument libreoffice-generated-doc flag is set.
I still need to implement checking for special cases of functions, such
as NOW(), where we always want to recalculate the formula result.
Here are import times before and after my commit
7a0fba0b0225f59f8c38b245cb21b81750271e26, using Markus's large matrix
test file with complicated functions (single sheet with approx 300cols x
5200rows), on an –enable-symbols libo build on a machine with AMD Athlon
64 X2 Dual Core 6400 @ 3.2GHz with 8GB Ram running 64 bit GNU/Linux:
Before commit: 22 seconds
After commit: 21 seconds
Really it's probably a less than a second improvement which is not so great.
Two areas that I think MAY help improve performance a little:
1) Since the goal was to just stop Interpret() from being called, my
current implementation allows the whole SetDirty() call chain to call
all the way down to the lowest level, which is
ScFormulaCell::SetDirtyVar(), before preventing the ScFormulaCell
object's dirty flag from being set. Performance may improve a bit, if I
stop this call chain at a higher level, preventing the overhead of the
calls and peripheral logic taking place in the call chain. I just have
to make sure that this peripheral logic isn't required.
2) ScXMLImport::setTargetDocument() is called for each tab in the
document which means the meta generator is checked and the ScDocument
libreoffice-generated-doc flag is set for each tab. Maybe I can find a
way to do this once while still having access to the XML and a valid
ScDocument. However, this probably won't affect performance much at all
especially in the test document I was using.
I'll try to see if callgrind will tell me anything with this test document.
More information about the LibreOffice