Regarding ODF import and Export support for HistogramChart

Devansh Varshney varshney.devansh614 at gmail.com
Thu Dec 19 14:32:01 UTC 2024


Hi everyone,

Thanks for such a detailed discussion. I have corrected certain parts of
the PR https://gerrit.libreoffice.org/c/core/+/177364
and the 'make' build is still running from 4:46 PM.

You should specify the new chart type as it would be specified in the
> standard. That text can go to our Wiki, linked from
>
> https://wiki.documentfoundation.org/Development/ODF_Implementer_Notes/List_of_LibreOffice_ODF_Extensions.
>
> Writing it down helps you to become clear about functionality and helps
> in writing the UNO information in the idl-file. Currently the info in
> the idl file is not detailed enough. You can look at section "19.15
> chart:class" in ODF 1.3.
> [
> https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html]
>
> and in the corresponding information for Excel. Search for histogram on
> site:microsoft.com and look at its specification in [MS-ODRAWXML]. You
> need to extend the above mentioned List_of_LibreOffice_ODF_Extensions in
> any case.

Added to the Implementer Notes but have to make a more detailed blog post.
(should I post that on TDF/LO Blog?)


You must extend the schema. Those changes go to
>
> https://opengrok.libreoffice.org/xref/core/schema/libreoffice/OpenDocument-v1.4%2Blibreoffice-schema.rng.
>
> That is missing in your patch.

Done (in the PR)


The histogram chart does not belong to the charts, that are specified in
> the standard. Thus it needs a value for the chart:class attribute, that
> has a loext prefix, e.g. chart:class="loext:histogram". A schema change
> is not needed for this value, because the data type for the value of
> this attribute is already 'namespacedToken'.
> You have added the 'bin' related information to the <chart:series>
> element. A <chart:plot-area> element can have several <chart:series>
> sub-elements. I guess, that you do not want to allow several series in
> the same histogram. Excel does no allow it. Restricting it in the schema
> is difficult. (Or do you have an idea, Michael?) I suggest to restrict
> it in the specification text.
> You export the labels for the x-axis as loext:BinRange. I would not
> export them at all for these reasons:
> (A) Excel does not export that information.
> (B) The chart has a reference to the area of the data source in the
> table. The content of this area might come from an external source, e.g.
> a database engine. When the file is loaded, this data might be refreshed
> and changes. Thus the bin labels and their frequency values might not
> fit to the information that are put into the file when saving.
> You write the 'bin' related information as attributes of the
> <chart:series> element. You should consider to use one child element
> instead, that contains all needed information. That way you can use a
> dedicated context when loading the file. The schema would get one new
> child-element for the <chart:series> element and a new section for this
> new element itself. Michael, what do you think?

Still have to discuss this with Tomaz


Different variations (types) are possible for the histogram chart. You
need to specify in the text how the bins are calculated. Especially how
'automatic' works and how overflow and underflow bins influence the bin
intervals.

We are using the Scott Rule to calculate the Histogram Chart automatically,
which is also used by my MSO.
chart2/source/model/template/HistogramCalculator.cxx

Here are those changes for the *Underflow and Overflow* *calculations*(I
reverted these changes during the cleanup of the PR)
https://gerrit.libreoffice.org/c/core/+/170909/43/chart2/source/model/template/HistogramCalculator.cxx


   -     Overflow Bin: Added at the end of maBinRanges and maBinFrequencies
   for values exceeding a threshold.
   -     Underflow Bin: Inserted at the beginning for values below a
   threshold.


You use two attributes for a underflow bin, one whether such underflow
exists and one with its value. I think that can be combined. In
implementation and schema it would be optional. The specification text
then needs to contain, what is used, when this attribute is missing.
Same for overflow. Excel has data type ST_DoubleOrAutomatic.


I have to do this


You write the new attributes with XML_NAMESPACE_CHART. It has to be
> XML_NAMESPACE_LO_EXT.
>
Corrected

You can use the histogram chart only in ODF extended. The according case
> distinctions are missing.
>
>
> ODF uses for attributes and element names a style with natural language
> terms separated by hyphen. Please keep this style. So instead of an
> attribute loext:histogram-binwidth it should be
> loext:histogram-bin-width. And instead of loext:histo it should be
> loext:histogram.
>
Corrected

On one hand you use a UNO property FrequencyType with datatype short and
> possible value 0 to 3, on the other hand you assign the property value
> to aFrequencies, which is a Sequence< double > ???
>
Corrected

Excel uses for histograms the element CT_Binning (see 2.24.3.7 in
> [MS-ODRAWXML]). That has the attribute intervalClosed to determine,
> whether the start or end side of the bin interval is open. The
> corresponding attribute is missing.

Did add in the RNG file, but have to make changes in other places too.
https://msopenspecs.azureedge.net/files/MS-ODRAWXML/%5bMS-ODRAWXML%5d-240820.pdf

Regarding Kurt's and Michael's reply
I will discuss with Tomaz(Quikee) what are his thoughts about how should I
approach it.


On Tue, 17 Dec 2024 at 21:46, Kurt Nordback <kurt.nordback at protonmail.com>
wrote:

> This bug is relevant to the question of handling multiple series in a
> histogram chart.
>
> https://bugs.documentfoundation.org/show_bug.cgi?id=163713
>
> Kurt
>
>
> Sent with Proton Mail secure email.
>
> On Monday, December 16th, 2024 at 11:02 PM, Mike Kaganski <
> mikekaganski at hotmail.com> wrote:
>
> > Hi Devansh, hi Regina,
> >
> > On 17.12.2024 4:51, Regina Henschel wrote:
> >
> > > You have added the 'bin' related information to the chart:series
> > > element. A chart:plot-area element can have several chart:series
> > > sub-elements. I guess, that you do not want to allow several series in
> > > the same histogram. Excel does no allow it. Restricting it in the
> > > schema is difficult. (Or do you have an idea, Michael?) I suggest to
> > > restrict it in the specification text.
> >
> >
> >
> > I suggest to check if OOXML restricts it. Sticking to the current Excel
> > behavior is reasonable as an implementation; but hardcoding the existing
> > implementation detail of Excel as a standard's wording would make it
> > hard to adapt, when Excel extends the implementation - it would need a
> > breaking change or a new chart class. (I don't know if such an extension
> > could make sense in principle, so this is just a general remark, maybe
> > nonsensical in this context - sorry for that.)
> >
> >
> > --
> >
> > Best regards,
> >
> > Mike Kaganski
>


-- 
*Regards,*
*Devansh*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20241219/c486777f/attachment.htm>


More information about the LibreOffice mailing list