[GSoC] Discussion on project idea "interface for external data source import into Calc"

Markus Mohrhard markus.mohrhard at googlemail.com
Thu Mar 8 22:01:37 UTC 2018

Hello Manuj,

On Wed, Mar 7, 2018 at 6:48 PM, Manuj Vashist <manujvashist at gmail.com>

> Hello everyone,
> I am a sophomore student persuing B.E. In Birla Institute of Technology &
> Science Pilani.
> I am exploring the LO code base since January and have merged a couple of
> easyHacks too :)
> I would like to work on the project idea " Implement interface for
> external data source import into Calc
> <https://wiki.documentfoundation.org/Development/GSoC/Ideas#Implement_interface_for_external_data_source_import_into_Calc>
> " in this summers as GSoC student.
> As the currently available dialog imports data from other csv files and
> html web pages.as the project is about extending the existing data
> providers and data transformation.
> I can think of data providers like a sql table that can be included to
> it,please give some more information on what kind of data transformation is
> referred here.
> Also there are two dialogs doing the same thing here link to external data
> dialog and data provider dialog, what are the use case of having two diff
> dialogs? can't both be merged together?
> A bit more info on project will be helpful.
The idea is similar to PowerQuery for Excel but with a more limited focus.
As a simple example take the stock data of the last week that is published
on some website and that we would like to integrate into our spreadsheet.
Currently this happens by downloading the csv file (hopefully the data is
in csv format or another spreadsheet format) and either copying the data or
using the link to external data. Both features don't handle updating the
data very well or transforming the data.

The idea now is to take all the different ways that we have to import
external data (link to external data, xml source, data streaming) and
combine them in one common feature. To make working with the external data
easier we also want to be able to apply simple transformations to the data
before importing them (like deleting a column, applying a filter,
sanitizing data, ...). The concept that I already started is to have a
second hidden document with a sheet that we use to import the data and then
apply the transformations before finally copying the data from the hidden
document to the final document. Currently the data is always imported into
a database range (Data->Define) that stores the range of the imported data.

You can already test some parts of the feature in current master by
enabling experimental features and then going to Data->Data Provider. The
implementation right now does most of the work in an own thread to allow
slow data fetching and transformations while keeping the UI responsive.
Most of the current code can be found at sc/source/ui/dataprovider/* except
for the ugly UI that is currently available (and which needs to be made
more user friendly) at sc/source/ui/miscdlgs/dataproviderdlg.cxx
Additionally there are some initial tests at sc/qa/unit/dataproviders_test.cxx
and sc/qa/unit/datatransformation_test.cxx that I used to prototype some of
the code.

Work during GSoC may include work on an improved UI for the feature, new
data providers and data transformations, storing the information about data
providers and transformations in files (ods and possibly xlsx), adding a
UNO interface to allow extension authors to add their own data providers
and data transformations and many more features that may be interesting.

Maybe start by having a look at the already implemented feature and then at
the code. I hope that at least most of the data providers and data
transformations are actually quite simple. If you are through that and have
more questions about some of the other ideas that I mentioned above feel
free to request more information.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20180308/54c6f657/attachment.html>

More information about the LibreOffice mailing list