Regarding ODF import and Export support for HistogramChart

Devansh Varshney varshney.devansh614 at gmail.com
Wed Jan 1 08:47:03 UTC 2025


Hi everyone,

Here is the working of Histogram chart type -
https://devanshvarshney.com/libreoffice-histogram-working


I am currently trying to get this PR working about the custom binning UI
option.

https://gerrit.libreoffice.org/c/core/+/170909

[image: image.png]

The checkbox option is added based on how MSO has done in the UI.

[image: Screenshot from 2024-12-31 01-56-08.png]

However, in this PR I have made changes to the calculation for the Overflow
and Underflow bin-based
on whether the user has entered the value or not.

// Handle underflow bin (first bin)
if (i == 0 && binStart == std::numeric_limits<double>::lowest())
{
aLabel = u"<"_ustr + OUString::number(binEnd);
}
// Handle overflow bin (last bin)
else if (i == binRanges.size() - 1 && binEnd == std::numeric_limits<double
>::max())
{
aLabel = u">"_ustr + OUString::number(binStart);
}


// Handle overflow bin if enabled
double overflowValue = std::numeric_limits<double>::quiet_NaN();
if (fOverflowValue.hasValue()) // Check if the Any contains a value
{
fOverflowValue >>= overflowValue; // Extract the value
if (!std::isnan(overflowValue)) // Check if the value is not NaN
{
sal_Int32 overflowCount
= std::count_if(rDataPoints.begin(), rDataPoints.end(),
[overflowValue](double value) { return value > overflowValue; });

// Add the overflow bin only if there are data points above the threshold
if (overflowCount > 0)
{
maBinRanges.push_back(
std::make_pair(overflowValue, std::numeric_limits<double>::max()));
maBinFrequencies.push_back(overflowCount);
}
}
}

// Handle underflow bin if enabled
double underflowValue = std::numeric_limits<double>::quiet_NaN();
if (fUnderflowValue.hasValue()) // Check if the Any contains a value
{
fUnderflowValue >>= underflowValue; // Extract the value
if (!std::isnan(underflowValue)) // Check if the value is not NaN
{
sal_Int32 underflowCount
= std::count_if(rDataPoints.begin(), rDataPoints.end(),
[underflowValue](double value) { return value <= underflowValue; });

// Add the underflow bin only if there are data points below the threshold
if (underflowCount > 0)
{
maBinRanges.insert(
maBinRanges.begin(),
std::make_pair(std::numeric_limits<double>::lowest(), underflowValue));
maBinFrequencies.insert(maBinFrequencies.begin(), underflowCount);
}
}
}


there are a couple of problems -

first is the persistence of the overflow and underflow bin.
and correct way of representing of the bins.
and introduction of the IntervalClosed.

So, what I was thinking is if we can get this first in the master, then
work on the
ODF import/export would become more approachable.


On Thu, 19 Dec 2024 at 20:02, Devansh Varshney <
varshney.devansh614 at gmail.com> wrote:

> Hi everyone,
>
> Thanks for such a detailed discussion. I have corrected certain parts of
> the PR https://gerrit.libreoffice.org/c/core/+/177364
> and the 'make' build is still running from 4:46 PM.
>
> You should specify the new chart type as it would be specified in the
>> standard. That text can go to our Wiki, linked from
>>
>> https://wiki.documentfoundation.org/Development/ODF_Implementer_Notes/List_of_LibreOffice_ODF_Extensions.
>>
>> Writing it down helps you to become clear about functionality and helps
>> in writing the UNO information in the idl-file. Currently the info in
>> the idl file is not detailed enough. You can look at section "19.15
>> chart:class" in ODF 1.3.
>> [
>> https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html]
>>
>> and in the corresponding information for Excel. Search for histogram on
>> site:microsoft.com and look at its specification in [MS-ODRAWXML]. You
>> need to extend the above mentioned List_of_LibreOffice_ODF_Extensions in
>> any case.
>
> Added to the Implementer Notes but have to make a more detailed blog post.
> (should I post that on TDF/LO Blog?)
>
>
> You must extend the schema. Those changes go to
>>
>> https://opengrok.libreoffice.org/xref/core/schema/libreoffice/OpenDocument-v1.4%2Blibreoffice-schema.rng.
>>
>> That is missing in your patch.
>
> Done (in the PR)
>
>
> The histogram chart does not belong to the charts, that are specified in
>> the standard. Thus it needs a value for the chart:class attribute, that
>> has a loext prefix, e.g. chart:class="loext:histogram". A schema change
>> is not needed for this value, because the data type for the value of
>> this attribute is already 'namespacedToken'.
>> You have added the 'bin' related information to the <chart:series>
>> element. A <chart:plot-area> element can have several <chart:series>
>> sub-elements. I guess, that you do not want to allow several series in
>> the same histogram. Excel does no allow it. Restricting it in the schema
>> is difficult. (Or do you have an idea, Michael?) I suggest to restrict
>> it in the specification text.
>> You export the labels for the x-axis as loext:BinRange. I would not
>> export them at all for these reasons:
>> (A) Excel does not export that information.
>> (B) The chart has a reference to the area of the data source in the
>> table. The content of this area might come from an external source, e.g.
>> a database engine. When the file is loaded, this data might be refreshed
>> and changes. Thus the bin labels and their frequency values might not
>> fit to the information that are put into the file when saving.
>> You write the 'bin' related information as attributes of the
>> <chart:series> element. You should consider to use one child element
>> instead, that contains all needed information. That way you can use a
>> dedicated context when loading the file. The schema would get one new
>> child-element for the <chart:series> element and a new section for this
>> new element itself. Michael, what do you think?
>
> Still have to discuss this with Tomaz
>
>
> Different variations (types) are possible for the histogram chart. You
> need to specify in the text how the bins are calculated. Especially how
> 'automatic' works and how overflow and underflow bins influence the bin
> intervals.
>
> We are using the Scott Rule to calculate the Histogram Chart
> automatically, which is also used by my MSO.
> chart2/source/model/template/HistogramCalculator.cxx
>
> Here are those changes for the *Underflow and Overflow* *calculations*(I
> reverted these changes during the cleanup of the PR)
>
> https://gerrit.libreoffice.org/c/core/+/170909/43/chart2/source/model/template/HistogramCalculator.cxx
>
>
>    -     Overflow Bin: Added at the end of maBinRanges and
>    maBinFrequencies for values exceeding a threshold.
>    -     Underflow Bin: Inserted at the beginning for values below a
>    threshold.
>
>
> You use two attributes for a underflow bin, one whether such underflow
> exists and one with its value. I think that can be combined. In
> implementation and schema it would be optional. The specification text
> then needs to contain, what is used, when this attribute is missing.
> Same for overflow. Excel has data type ST_DoubleOrAutomatic.
>
>
> I have to do this
>
>
> You write the new attributes with XML_NAMESPACE_CHART. It has to be
>> XML_NAMESPACE_LO_EXT.
>>
> Corrected
>
> You can use the histogram chart only in ODF extended. The according case
>> distinctions are missing.
>>
>>
>> ODF uses for attributes and element names a style with natural language
>> terms separated by hyphen. Please keep this style. So instead of an
>> attribute loext:histogram-binwidth it should be
>> loext:histogram-bin-width. And instead of loext:histo it should be
>> loext:histogram.
>>
> Corrected
>
> On one hand you use a UNO property FrequencyType with datatype short and
>> possible value 0 to 3, on the other hand you assign the property value
>> to aFrequencies, which is a Sequence< double > ???
>>
> Corrected
>
> Excel uses for histograms the element CT_Binning (see 2.24.3.7 in
>> [MS-ODRAWXML]). That has the attribute intervalClosed to determine,
>> whether the start or end side of the bin interval is open. The
>> corresponding attribute is missing.
>
> Did add in the RNG file, but have to make changes in other places too.
>
> https://msopenspecs.azureedge.net/files/MS-ODRAWXML/%5bMS-ODRAWXML%5d-240820.pdf
>
> Regarding Kurt's and Michael's reply
> I will discuss with Tomaz(Quikee) what are his thoughts about how should I
> approach it.
>
>
> On Tue, 17 Dec 2024 at 21:46, Kurt Nordback <kurt.nordback at protonmail.com>
> wrote:
>
>> This bug is relevant to the question of handling multiple series in a
>> histogram chart.
>>
>> https://bugs.documentfoundation.org/show_bug.cgi?id=163713
>>
>> Kurt
>>
>>
>> Sent with Proton Mail secure email.
>>
>> On Monday, December 16th, 2024 at 11:02 PM, Mike Kaganski <
>> mikekaganski at hotmail.com> wrote:
>>
>> > Hi Devansh, hi Regina,
>> >
>> > On 17.12.2024 4:51, Regina Henschel wrote:
>> >
>> > > You have added the 'bin' related information to the chart:series
>> > > element. A chart:plot-area element can have several chart:series
>> > > sub-elements. I guess, that you do not want to allow several series in
>> > > the same histogram. Excel does no allow it. Restricting it in the
>> > > schema is difficult. (Or do you have an idea, Michael?) I suggest to
>> > > restrict it in the specification text.
>> >
>> >
>> >
>> > I suggest to check if OOXML restricts it. Sticking to the current Excel
>> > behavior is reasonable as an implementation; but hardcoding the existing
>> > implementation detail of Excel as a standard's wording would make it
>> > hard to adapt, when Excel extends the implementation - it would need a
>> > breaking change or a new chart class. (I don't know if such an extension
>> > could make sense in principle, so this is just a general remark, maybe
>> > nonsensical in this context - sorry for that.)
>> >
>> >
>> > --
>> >
>> > Best regards,
>> >
>> > Mike Kaganski
>>
>
>
> --
> *Regards,*
> *Devansh*
>


-- 
*Regards,*
*Devansh*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250101/f75f2adc/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 179570 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250101/f75f2adc/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2024-12-31 01-56-08.png
Type: image/png
Size: 671336 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/libreoffice/attachments/20250101/f75f2adc/attachment-0003.png>


More information about the LibreOffice mailing list