Regarding ODF import and Export support for HistogramChart

Regina Henschel rb.henschel at t-online.de
Mon Dec 16 23:51:25 UTC 2024


Hi Devansh,

This answer is longly and touches general areas, so not suitable as 
direct comment to your patch. It contains my ideas about realizing ODF 
import/export.

Hi Michael, I have put you in CC because you can surely say something in 
regard to ODF and correct me where I'm wrong.

Devansh Varshney schrieb am 14.12.2024 um 13:49:
> Thanks, Regina, for such detailed information. This helped me to 
> approach the
> import/export for the Histogram Chart.
> 
> I have added changes to the ODF Export for the Histogram Chart.
> Have to add support for the Import and addition in the RNG file.
> 
> https://gerrit.libreoffice.org/c/core/+/177364
> 
> Would anyone from the community be able to help me by reviewing the PR?
>

I assume, that your intension is to implement a histogram chart similar 
to Excel.

You should specify the new chart type as it would be specified in the 
standard. That text can go to our Wiki, linked from 
https://wiki.documentfoundation.org/Development/ODF_Implementer_Notes/List_of_LibreOffice_ODF_Extensions. 
Writing it down helps you to become clear about functionality and helps 
in writing the UNO information in the idl-file. Currently the info in 
the idl file is not detailed enough. You can look at section "19.15 
chart:class" in ODF 1.3. 
[https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html] 
and in the corresponding information for Excel. Search for histogram on 
site:microsoft.com and look at its specification in [MS-ODRAWXML]. You 
need to extend the above mentioned List_of_LibreOffice_ODF_Extensions in 
any case.


You must extend the schema. Those changes go to 
https://opengrok.libreoffice.org/xref/core/schema/libreoffice/OpenDocument-v1.4%2Blibreoffice-schema.rng. 
That is missing in your patch.


The histogram chart does not belong to the charts, that are specified in 
the standard. Thus it needs a value for the chart:class attribute, that 
has a loext prefix, e.g. chart:class="loext:histogram". A schema change 
is not needed for this value, because the data type for the value of 
this attribute is already 'namespacedToken'.


You have added the 'bin' related information to the <chart:series> 
element. A <chart:plot-area> element can have several <chart:series> 
sub-elements. I guess, that you do not want to allow several series in 
the same histogram. Excel does no allow it. Restricting it in the schema 
is difficult. (Or do you have an idea, Michael?) I suggest to restrict 
it in the specification text.


You export the labels for the x-axis as loext:BinRange. I would not 
export them at all for these reasons:
(A) Excel does not export that information.
(B) The chart has a reference to the area of the data source in the 
table. The content of this area might come from an external source, e.g. 
a database engine. When the file is loaded, this data might be refreshed 
and changes. Thus the bin labels and their frequency values might not 
fit to the information that are put into the file when saving.


You write the 'bin' related information as attributes of the 
<chart:series> element. You should consider to use one child element 
instead, that contains all needed information. That way you can use a 
dedicated context when loading the file. The schema would get one new 
child-element for the <chart:series> element and a new section for this 
new element itself. Michael, what do you think?


Different variations (types) are possible for the histogram chart. You 
need to specify in the text how the bins are calculated. Especially how 
'automatic' works and how overflow and underflow bins influence the bin 
intervals.


You use two attributes for a underflow bin, one whether such underflow 
exists and one with its value. I think that can be combined. In 
implementation and schema it would be optional. The specification text 
then needs to contain, what is used, when this attribute is missing. 
Same for overflow. Excel has data type ST_DoubleOrAutomatic.


The 'binCount' and 'binWidth' information are coupled to the chart 
variants FrequencyType=2 'Number of bins (BinCount)' and FrequencyType=1 
'Bin Width'. You write them in all cases. Especially for variant 
FrequencyType=3 'By Category' the binWidth attribute is meaningless. On 
the other hand for the variant FrequencyType=1 it is mandatory. Michael, 
do you have a nice idea for the schema?
In implementation they might both be optional with an assert, if they 
are missing in their corresponding FrequencyType. For UNO it might be 
sufficient to make them optional and mention the dependencies in the text.


You write the new attributes with XML_NAMESPACE_CHART. It has to be 
XML_NAMESPACE_LO_EXT.


You can use the histogram chart only in ODF extended. The according case 
distinctions are missing.


ODF uses for attributes and element names a style with natural language 
terms separated by hyphen. Please keep this style. So instead of an 
attribute loext:histogram-binwidth it should be 
loext:histogram-bin-width. And instead of loext:histo it should be 
loext:histogram.


On one hand you use a UNO property FrequencyType with datatype short and 
possible value 0 to 3, on the other hand you assign the property value 
to aFrequencies, which is a Sequence< double > ???


Excel uses for histograms the element CT_Binning (see 2.24.3.7 in 
[MS-ODRAWXML]). That has the attribute intervalClosed to determine, 
whether the start or end side of the bin interval is open. The 
corresponding attribute is missing.


Kind regards,
Regina













More information about the LibreOffice mailing list