Regarding ODF import and Export support for HistogramChart
Regina Henschel
rb.henschel at t-online.de
Mon Dec 16 23:51:25 UTC 2024
Hi Devansh,
This answer is longly and touches general areas, so not suitable as
direct comment to your patch. It contains my ideas about realizing ODF
import/export.
Hi Michael, I have put you in CC because you can surely say something in
regard to ODF and correct me where I'm wrong.
Devansh Varshney schrieb am 14.12.2024 um 13:49:
> Thanks, Regina, for such detailed information. This helped me to
> approach the
> import/export for the Histogram Chart.
>
> I have added changes to the ODF Export for the Histogram Chart.
> Have to add support for the Import and addition in the RNG file.
>
> https://gerrit.libreoffice.org/c/core/+/177364
>
> Would anyone from the community be able to help me by reviewing the PR?
>
I assume, that your intension is to implement a histogram chart similar
to Excel.
You should specify the new chart type as it would be specified in the
standard. That text can go to our Wiki, linked from
https://wiki.documentfoundation.org/Development/ODF_Implementer_Notes/List_of_LibreOffice_ODF_Extensions.
Writing it down helps you to become clear about functionality and helps
in writing the UNO information in the idl-file. Currently the info in
the idl file is not detailed enough. You can look at section "19.15
chart:class" in ODF 1.3.
[https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html]
and in the corresponding information for Excel. Search for histogram on
site:microsoft.com and look at its specification in [MS-ODRAWXML]. You
need to extend the above mentioned List_of_LibreOffice_ODF_Extensions in
any case.
You must extend the schema. Those changes go to
https://opengrok.libreoffice.org/xref/core/schema/libreoffice/OpenDocument-v1.4%2Blibreoffice-schema.rng.
That is missing in your patch.
The histogram chart does not belong to the charts, that are specified in
the standard. Thus it needs a value for the chart:class attribute, that
has a loext prefix, e.g. chart:class="loext:histogram". A schema change
is not needed for this value, because the data type for the value of
this attribute is already 'namespacedToken'.
You have added the 'bin' related information to the <chart:series>
element. A <chart:plot-area> element can have several <chart:series>
sub-elements. I guess, that you do not want to allow several series in
the same histogram. Excel does no allow it. Restricting it in the schema
is difficult. (Or do you have an idea, Michael?) I suggest to restrict
it in the specification text.
You export the labels for the x-axis as loext:BinRange. I would not
export them at all for these reasons:
(A) Excel does not export that information.
(B) The chart has a reference to the area of the data source in the
table. The content of this area might come from an external source, e.g.
a database engine. When the file is loaded, this data might be refreshed
and changes. Thus the bin labels and their frequency values might not
fit to the information that are put into the file when saving.
You write the 'bin' related information as attributes of the
<chart:series> element. You should consider to use one child element
instead, that contains all needed information. That way you can use a
dedicated context when loading the file. The schema would get one new
child-element for the <chart:series> element and a new section for this
new element itself. Michael, what do you think?
Different variations (types) are possible for the histogram chart. You
need to specify in the text how the bins are calculated. Especially how
'automatic' works and how overflow and underflow bins influence the bin
intervals.
You use two attributes for a underflow bin, one whether such underflow
exists and one with its value. I think that can be combined. In
implementation and schema it would be optional. The specification text
then needs to contain, what is used, when this attribute is missing.
Same for overflow. Excel has data type ST_DoubleOrAutomatic.
The 'binCount' and 'binWidth' information are coupled to the chart
variants FrequencyType=2 'Number of bins (BinCount)' and FrequencyType=1
'Bin Width'. You write them in all cases. Especially for variant
FrequencyType=3 'By Category' the binWidth attribute is meaningless. On
the other hand for the variant FrequencyType=1 it is mandatory. Michael,
do you have a nice idea for the schema?
In implementation they might both be optional with an assert, if they
are missing in their corresponding FrequencyType. For UNO it might be
sufficient to make them optional and mention the dependencies in the text.
You write the new attributes with XML_NAMESPACE_CHART. It has to be
XML_NAMESPACE_LO_EXT.
You can use the histogram chart only in ODF extended. The according case
distinctions are missing.
ODF uses for attributes and element names a style with natural language
terms separated by hyphen. Please keep this style. So instead of an
attribute loext:histogram-binwidth it should be
loext:histogram-bin-width. And instead of loext:histo it should be
loext:histogram.
On one hand you use a UNO property FrequencyType with datatype short and
possible value 0 to 3, on the other hand you assign the property value
to aFrequencies, which is a Sequence< double > ???
Excel uses for histograms the element CT_Binning (see 2.24.3.7 in
[MS-ODRAWXML]). That has the attribute intervalClosed to determine,
whether the start or end side of the bin interval is open. The
corresponding attribute is missing.
Kind regards,
Regina
More information about the LibreOffice
mailing list