New Calc Internal Code Feature: An API for dumping and reloading a sheet's content to a (file) stream

Markus Mohrhard markus.mohrhard at googlemail.com
Wed Aug 16 14:43:04 UTC 2017


Hey,

I wanted quickly present the new Calc content cache that I just added. The
last round of patches has just been pushed for gerrit builds. I hope this
code is also useful for someone else.


* What it is?

The code is a fast and simple binary storage of a ScTable's cell content.
It is meant to store the cell content (values, strings and formulas) in a
format that allows quickly dumping the content and reading it back. I will
need it soon for the external data feature as a way to cache transformation
steps on disk.

The format of the cache is designed with speed in mind and is close to the
structure of how mdds::multi_type_vector stores our cell content. Therefore
content is stored in a column orientated direction in contrast to the
normal row format of ODF and OOXML.


* What it is not?

Don't use this code if you need to store data long term or if you need more
than storing the cell content (strings, values, formulas). Note that
editengine strings are not supported right now (and I don't plan to add
support for them at all).

The format is not stable and should not be used for anything except
debugging and possibly caching of data.

* How to use it?

The API is available through ScDocument with the two new functions
ScDocument::StoreTabToCache and ScDocument::RestoreTabFromCache with both
functions taking a SvStream argument (e.g. SvFileStream, SvMemoryStream).
The code automatically limits the exported range in a sheet to the actually
used data range and empty blocks are quite cheap.

Additionally, as the format is binary it is not easy to read so I added a
python script that is able to parse the generated content to
sc/workben/cache.py. The script will print a python dictionary with the
whole cell content in the format {column_number: {row_number: "content" }}.

An example from one of my unit tests:
{0: {0: ('=RC[1]', 'formula group length 4'), 4: ('=R[-4]C[1]', 'formula
group length 1')}}

As can be noted formulas are stored as R1C1 formulas and only the top cell
of a formula group.

Besides my planned use in the cache I think this code might be able to help
in some complex debugging scenarios where you need to quickly check the
content of a sheet in a place that you normally are not able visually
inspect the spreadsheet (e.g. debugging of complex sorting operations).


As always comments and feedback appreciated.

Regards,
Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20170816/581f465d/attachment.html>


More information about the LibreOffice mailing list