Reading parquet files in Calc
Kohei Yoshida
kohei at libreoffice.org
Thu Oct 6 23:06:39 UTC 2022
Hi Jacek,
On 06.10.2022 05:54, Jacek Pliszka wrote:
> I found an old thread about adding it to Orcus library instead.
>
> Is it the best approach?
It is an approach. But I wouldn't say it's the best approach. Orcus
library has traditionally been geared more toward supporting text-file
based file formats, such as csv, xlsx, ods, gnumeric etc ., whereas my
understanding of parquet file format is that it is a binary file format.
> If Orcus could use arrow library then it should be relatively easy.
> similar to .csv files.
Yes, I believe that's doable. Having said that, it's my understanding
that the arrow library provides a nice abstraction optimized for
columnar in-memory formats. So, if we were to use it in orcus, which is
not necessarily optimized for columnar in-memory formats, we may lose
some efficiency just by having to potentially go through two layers of
abstraction that both have different focus. Someone would need to take
a closer look at the design of the arrow library and decide which
approach makes more sense: using it in orcus or using it directly in the
libreoffice codebase.
I would have been very happy to take a closer look at the arrow library.
But right now I'm trying to finish up all the features that need to go
into the next release of orcus, so I won't be able to do that anytime
soon unfortunately.
Kohei
More information about the LibreOffice
mailing list