tdf#50916 : Calc : Dynamic column container

Markus Mohrhard markus.mohrhard at googlemail.com
Sat Oct 10 06:58:41 PDT 2015


Hey Dennis,

On Sat, Oct 10, 2015 at 3:30 PM, Dennis Francis <dennisfrancis.in at gmail.com>
wrote:

> Hi Markus
>
> Thanks for your valuable feedback !
>
> 1.
> """
> A simple high level design could be to use a std::vector<ScColumn*> and
> only allocate columns that really contain content or formatting.
> Additionally we'd need to introduce a way to store the format of a row.
> These row formats might not be visible to the user and would be just an
> internal way to handle the formatting for all not yet allocated columns.
> """
>
> Let me make sure I get this idea right -
>
> The aCol member in ScTable is declared as :
> *std::vector<ScColumn*> aCol;*  // Or can have wrapper on it own
> and in the ScTable ctor aCol is init with a minimum number (or possibly
> 16K) of NULL pointer elements ie no ScColumn instance is allocated yet.
> If we allocate space for all pointers in aCol vector then I think it might
> take around 128 KB just for pointers,
> (point 1.A) so might want to allocate that too just when it is needed,
> hence avoid the need to keep a int var for keeping the index of last non
> null element.
>

Please don't allocate any pointers in advance. There is no value in that
and it just causes higher memory load.



>
> While loading a sheet or user types in something in ith column for the
> first time, allocate ScColumn
>
>
> *if ( ! aCol[i] ){*
>
> *  aCol[i] = new ScColumn;*
>
> *  // Apply any full row formattings to aCol[i]*
>
> *}*
>

As we would not pre allocate the vector it would be more like:

if (i >= aCol.size())
{
    aCol.resize(i+1);
}
if (!aCol[i])
{
   aCol[i] = new ScColumn;
}

>
> 2.
> Correct me if I am wrong - I guess the formatting of each cell in a column
> is stored in pAttrArray member of ScColumn. So for storing full row
> formatting, we need
> a data structure that map from row number to the formatting info.
> Definitely need your inputs and code pointers on this.
>


Either that, which would be quite trivial or a way to have a concept where
row and document formats are handled independently from the column
formatting. E.g instead of our current design that stores a formatting
entry even for a column that has default formatting only store entries for
columns and cells that have been explicitly formatted. So if there is no
column/cell formatting (they are basically the same) we would take the row
formatting and if no row formatting is around take the document formatting.

Which of the two makes more sense would need some more design work.


>
>
>
> This is surely not what we had in mind. Basically you just wrote a wrapper
>> around std::vector.
>>
>> There are a number of items a new design need:
>>
>> * decision whether to store all ScColumn instances or only filled ones
>>
>
> 3.
> If we use just std::vector<ScColumn>, we have to allocate n ScColumns if
> there are totally n columns in the sheet including blank ones upto the last
> non-blank column.
> If we use std::vector<ScColumn*> we need to allocate ScColumn's for only
> the non blank columns in the document - definitely better.
>

The question is how likely it is to find such a case where it matters. The
bigger problem would be that for std::vector<ScColumn> ScColumn needs to be
either movable or copyable. Both are most likely things we are not prepared
to do.


>
>
>> * a way to handle the increased memory load
>> ** most likely limiting the number of initial columns
>>
>
> See point 1.
>

As mentioned there we should only allocate entries that are necessary. So
an empty document basically just needs the size of an empty std::vector.
Also ideas like the row formatting are part of the solution to this problem.


>
> 4.
>
>> * a way to handle the performance impact of many columns
>> ** most likely improving the iterations through all columns
>>
>
> If we use std::vector<ScColumn>, We can trim the for loops to run from 0
> up to current length of aCol.
> If we use std::vector<ScColumn*>, We can trim the for loops to run from 0
> up to current length of aCol and also skip running the inner logic whenever
> aCol[i] == NULL, there by saving time, but not sure we can convert all
> column for loops like that offhand.
>

You need to skip vector entries that are NULL anyway.

Obviously you need to convert all loops as you have some problems otherwise.


>
>
>>
>> * a better way to handle row formattings
>> ** what happens if someone marks a whole row and formats it
>> ** how to handle formatting for columns that were allocated after the
>> user formatted the whole row
>>
>> See point 2.
>
>
>> * most likely a way to store the last formatted column and the last
>> column with content
>> ** needs some inspecting which loops through the columns need which of
>> the information
>>
>> See point 1.A, using this we can save time wasted on iterating over blank
> columns(aCol[i] == NULL).
> Could you please explain why we need to remember "last formatted column" ?
>

Depending on the design this happens directly or you need to do it
indirectly. These comments where general comments without looking at any
implementation. In the implementation with the std::vector<ScColumn*> this
happens indirectly though std::vector<ScColumn*>size().

 The idea to store the last column with content might help us avoid looping
through a lot of columns that are only formatted. It is not yet sure if
that is really necessary.


>
>> Most likely a few more that I have not in mind right now.
>>
>
> Will surely wait for others to comment before proceeding any further.
>


The proposed design is based on a discussion between Kohei and met. Maybe
Eike has some additional comments otherwise feel free to work on a design.


>
>
> Thanks,
> Dennis
>
> www.ldcs.co.in
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20151010/489651e3/attachment.html>


More information about the LibreOffice mailing list