Fw: benchmark of Excel, Calc, Google Docs

Sun Dec 22 02:09:32 UTC 2019

Michael, just checking in to see what might be a good time to chat. We're
excited to connect!

Aditya

On Fri, Dec 13, 2019 at 2:22 PM Aditya Parameswaran <adityagp at berkeley.edu>
wrote:

> Michael,
>
> We'd love to meet and discuss!  Unfortunately, a lot of us are off for
> break starting next week so it might be best to sync up early next year.
> Would week of the 6th work for you? 8am PT/10am CT/4pm GMT any day should
> work!
>
> > We started by having the relational database be a simple persistent
>> > storage layer, when coupled with an index to retrieve data by position,
>> > can allow us to scroll through large datasets of billions of rows at
>> > ease. We developed a new positional index to handle insertions and
>> > deletions in O(log(n)) -- https://arxiv.org/pdf/1708.06712.pdf. I agree
>> > that pushing the computation to the relational database does have
>> > overheads; but at the same time, it allows for scaling to arbitrarily
>> > large datasets.
>>
>>         Ooh - nice paper. Your crawled data-set looks quite interesting
>> too, we
>> run wide-scale crash-testing on the LibreOffice code-base across ~100k
>> files and enlarging our corpus there: or better, getting some
>> statistical view of which OOXML attributes (and thus features) are most
>> used out there would be extremely useful to us as we develop the core.
>>
>>         I like the data on spreadsheet and formula shape - that is very
>> useful.
>> Do you have data on the geometry of formulae - as in rows vs. columns ?
>> [ we switched to columnar storage based mostly on experience rather than
>> hard data ;-].
>>
>>         It is also interesting to have access to very large (1.3m row)
>> data-sets that can have useful analysis done on them - would love to see
>> the source data there.
>>
>
> Again, this is something that we'd be happy to share; this might just take
> a bit more work since it's an older codebase.
> I believe we did use the geometry of the formulae to determine the best
> storage representation, so it's there somewhere :-)
>
>         Sounds good, cf. above - if we can't make that - early in the new
>> year
>> would be great.
>>
>>         I look forward to talking,
>>
>
> Likewise!
>
> Aditya
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20191221/aff45371/attachment.htm>