[Libreoffice-bugs] [Bug 117041] New: Cumulative hypergeometric probability calculation error when discrete probabilities are too small

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Mon Apr 16 15:01:16 UTC 2018


https://bugs.documentfoundation.org/show_bug.cgi?id=117041

            Bug ID: 117041
           Summary: Cumulative hypergeometric probability calculation
                    error when discrete probabilities are too small
           Product: LibreOffice
           Version: 5.4.5.1 release
          Hardware: x86-64 (AMD64)
                OS: Windows (All)
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Calc
          Assignee: libreoffice-bugs at lists.freedesktop.org
          Reporter: charlesrv7 at gmail.com

Description:
HYPGEOM.DIST(x,NSample,Successes,Npopulation,Cumulative) function fails to
calculate cumulative probability (Cumulative=1) for situations where part of
the individual mass functions are too small and gives out "Err:502". What the
function should do is to equate those values to zero or do not consider them
when adding Pr(X=0) + Pr(X=1) + ... + Pr(X=x) to calculate Pr(X<=x).

That is what, for instance, MS Excel 2013 seems to do when calculating
cumulative hypergeometric probabilities. Its corresponding function
(HYPGEOM.DIST) does not give out wrong answers in situations where Calc does.

Example:
HYPGEOM.DIST(34,36,89,100,1) is 1.78862E+20 in Calc version 5.4.5.1 (x64). That
is absurd, as probabilities are limited to 1.

The correct answer is 0.9562476872.

There are countless input values that will give out erros too. For instance,
HYPGEOM.DIST(53,55,269,300,1) in Calc is 0.991282957, but the more correct
answer is 0.988500551. Such kind of erro is worse, because it is not patently
absurd.

Excel, R and hypergeometric calculators found in the Internet (for instance,
http://stattrek.com/online-calculator/hypergeometric.aspx) give the correct
answers.
It seems just discarding the individual mass probabilities for values of
Pr(X=x) where the results are so small they fall beyond (or bellow) machine
precision is enough to prevent the error.

I did that using the individual hypergeometric mass probability function from
Calc (either HYPGEOMDIST or HYPGEOM.DIST with "Cumulative=0") and the error is
avoided (see the "Aux" sheet in the .ods file I am sending attached).

Steps to Reproduce:
In any Calc cell enter, for instance, "=HYPGEOM.DIST(34,36,89,100,1)" or
"HYPGEOM.DIST(53,55,269,300,1)".

Actual Results:  
The incorrect results will be, respectively, "1.78862483604230E+20" and
"0.991282957036719".

Expected Results:
The correct results should be, respectively, "0.95624768719555" and
"0.988500551352757".


Reproducible: Always


User Profile Reset: No



Additional Info:
All I did seems to point to an easy solution to have the bug fixed: just equate
to zero or do not take into consideration the values of Pr(X=x) in error (value
too small, bellow machine precision).
I only tested in Windows 10 environment. I do not know if the error in
HYPGEOM.DIST also occurs in Linux environments.


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20180416/a369f9b4/attachment-0001.html>


More information about the Libreoffice-bugs mailing list