Pivot Table data provider extension framework (removal possibility)

Thu Mar 21 09:05:31 PDT 2013

On Wed, Mar 20, 2013 at 5:20 AM, Eike Rathke <erack at redhat.com> wrote:
> Hi Kohei,
>
> On Thursday, 2013-03-14 09:26:55 -0400, Kohei Yoshida wrote:
>
>> >> I believe the same functionality can be achieve via database
>> >> connectivity, by having such external data provider register as a
>> >> database, and use it to act as a data provider for pivot tables.
>> >> So, I don't see a reason why we need to keep this as a separate data
>> >> source category.
>> >
>> > IMHO the advantage of the data provider is that the actual data does not
>> > have to reside in the spreadsheet, allowing for massive amounts of data
>> > records but providing only the information necessary for the pivot
>> > table. This maybe could be accomplished as well using a registered data
>> > source, but currently we have no means to pull the data without actually
>> > storing it in the spreadsheet for further processing. Or isn't that the
>> > case?
>>
>> Well, that would depend on what you actually mean by "storing (the
>> data) in the spreadsheet". When pulling data via database
>> connectivity, we don't actually copy the data in the spreadsheet
>> document, but generate the pivot table output directly from it. But we
>> *do* first populate the pivot cache from the database internally, so a
>> copy of the data will sit in memory while the document is open.
>
> That's my bad then. I assumed the data was stored in a DB range.
>
> Is that different with the data provider, i.e. does it not need to copy
> all data to populate the pivot cache with an interface to directly
> populate the layouted pivot table?

Well, that's how it is implemented today. It's not per design but due
to how this feature has evolved historically.  This data provider
interface was designed and put in place *before* we added this pivot
cache backend.

This difference actually causes additional headache, since we can't
always assume that the pivot cache be populated, which ties our hands
in many places in the pivot engine.

> Other advantages a data provider could have are a) be able to collect
> data from various e.g. remote sources that a simple data connection
> could not provide,

Yes, but to achieve that, one has to implement the *whole pivot result
calc engine*. To me that's an overkill, just to avoid implementing a
simple data connectivity backend. It would be much simpler to just
write a data connectivity backend and re-use the database connectivity
backend of the pivot table.

and b) access data in means not possible with
> database connectivity, for example if the user shall be restricted to
> a subset of a database or not be able to query using SQL statements.

Sure. But I'm sure that could be implemented via some sort of data
connectivity proxy, which to me would be much simpler than developing
the entire result calc engine from scratch.

> Probably there'll always be _some_ use cases such a provider could have
> (does Excel have that? if yes then there are ...),

Unless I missed something (someone could enlighten me on this), Excel
only provides MS SQL connectivity which is equivalent of our database
connectivity backend.

so if it's ripped out
> maybe offering a new interface adapted to the new data types and
> structures that sits on top of the engine instead of being part of it
> would be good.

Sure. But to justify this enormous design constraint, I'd like to hear
from the actual users / deployers about why this special data provider
was needed in the first place, so that their requirement still will
justify the complexity it imposes on 100% of users of pivot table,
including those who don't use this data provider backend (which I
imagine constitutes 99.9% of all pivot table users).

Kohei