[Libreoffice-bugs] [Bug 109097] New: mdds accelerating random lookups

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Thu Jul 13 11:25:37 UTC 2017


https://bugs.documentfoundation.org/show_bug.cgi?id=109097

            Bug ID: 109097
           Summary: mdds accelerating random lookups
           Product: LibreOffice
           Version: 5.3.3.1 rc
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: medium
         Component: Calc
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: michael.meeks at collabora.com

mdds provides an excellent, compact and high-performance storage for
lots of cases, however we have a few files whereby the performance of
lookups in mdds could be improved. These are for calculation of
formulae in sheets that have rather sparse data in columns -
containing many blocks of blanks particularly interspersed with data.

I understand that the calc team have spent a lot of time fixing the
root causes of many of these issues by using better iterators down
columns and so on, but for this specific case - having better "random
access" performance could be extremely helpful.

        ScColumn::GetCellValue

is the majority of this cost - called ultimately from ScInterpreter.

I wonder if we could prototype this inside ScColumn::GetCellValue in
fact (?) thoughts appreciated. Hopefully this would lead to fixing the
main random access problem, while keeping examples of poor iteration
visible in the profiles.

Code pointers:

workdir/UnpackedTarball/mdds/include/mdds/multi_type_vector.hpp
sc/inc/mtvelements.hxx

typedef mdds::multi_type_vector<CellFunc, CellStoreEvent> CellStoreType;

typedef mdds::multi_type_vector<CellFunc, CellStoreEvent> CellStoreType;
    CellStoreType::const_iterator itBlk = rSrc.begin(), itBlkEnd = rSrc.end();

The problem here being that:

    struct block
    {
        size_type m_size;
        element_block_type* mp_data;
    }

So each block has no idea of its absolute position - which makes a lot
of sense for lots of cases such as mutating the sheet - inserting rows
etc. otherwise we could store absolute position, and size as a
difference to the next/terminating block.

>From an optimizing trade-off perspective, optimizing for inserts in
the middle of sheets at the expense of individual cell lookups, or
file-filters appending data is probably not deliberate.

Of course - failing this, during very 'random' access patterns such as
calculation; we could carry around some sort of lookup / caches of
common columns we like to get data from.

Aron - any chance you can add some SAL_DEBUG() lines to that pivot
sheet to print out all the arguments to: ScColumn::GetCellValue -
including the column's members, nCol, nTab and the nRow passed in ?
hopefully that will give us a big log to see if there is indeed any
patterns in at least this case.

And of course, probably this is a duplicate ticket =) Thoughts ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20170713/2bb514f0/attachment.html>


More information about the Libreoffice-bugs mailing list