Reading parquet files in Calc

Kohei Yoshida kohei at libreoffice.org
Thu Oct 6 23:06:39 UTC 2022


Hi Jacek,

On 06.10.2022 05:54, Jacek Pliszka wrote:

> I found an old thread about adding it to Orcus library instead.
> 
> Is it the best approach?

It is an approach. But I wouldn't say it's the best approach.  Orcus 
library has traditionally been geared more toward supporting text-file 
based file formats, such as csv, xlsx, ods, gnumeric etc ., whereas my 
understanding of parquet file format is that it is a binary file format.

> If Orcus could use arrow library then it should be relatively easy.
> similar to .csv files.

Yes, I believe that's doable.  Having said that, it's my understanding 
that the arrow library provides a nice abstraction optimized for 
columnar in-memory formats. So, if we were to use it in orcus, which is 
not necessarily optimized for columnar in-memory formats, we may lose 
some efficiency just by having to potentially go through two layers of 
abstraction that both have different focus.  Someone would need to take 
a closer look at the design of the arrow library and decide which 
approach makes more sense: using it in orcus or using it directly in the 
libreoffice codebase.

I would have been very happy to take a closer look at the arrow library. 
But right now I'm trying to finish up all the features that need to go 
into the next release of orcus, so I won't be able to do that anytime 
soon unfortunately.

Kohei


More information about the LibreOffice mailing list