Acceptable outcomes of SwarmSolverTest::testUnconstrained

Wed Feb 28 01:36:38 UTC 2018

Hi,

On Tue, Feb 27, 2018 at 4:53 PM, Stephan Bergmann <sbergman at redhat.com> wrote:
> SwarmSolverTest::testUnconstrained in sccomp/qa/unit/SwarmSolverTest.cxx has
> already been weakened in the past,
> <https://cgit.freedesktop.org/libreoffice/core/commit/?id=1fa761af825641da5c87f80c2a17135f92418960>
> "Ridiculously large delta for SwarmSolverTest::testUnconstrained for now"
> and
> <https://cgit.freedesktop.org/libreoffice/core/commit/?id=0c3444c9bcee093ad5976af8948138e6f2a97706>
> "Weaken SwarmSolverTest::testUnconstrained even further for now".  The first
> one has the following in its commit message: "suggestion by Tomaž Vajngerl
> was: 'Let's adapt the delta for now. Generally anything close to 3 should be
> acceptable as the algorithm greatly depends on random values.'"
>
> Now <https://ci.libreoffice.org/job/lo_ubsan/833/console> failed with
>
>>
>> /home/tdf/lode/jenkins/workspace/lo_ubsan/sccomp/qa/unit/SwarmSolverTest.cxx:106:(anonymous
>> namespace)::SwarmSolverTest::testUnconstrained
>> double equality assertion failed
>> - Expected: 3
>> - Actual  : 94.6605927051114
>> - Delta   : 0.9
>
>
> Is that also an acceptable outcome, or does it indicate a bug somewhere that
> would need to be fixed?  What good is a test whose success criterion is the
> result of ad-hoc guesswork, instead of being determined precisely up-front
> when the test was written?
> Can that test please be fixed properly, so that it would be actually useful?

Well, it is neither - that's just the nature of stochastic algorithms.
It is not the fault of the test - how it was defined at the beginning
was the exact outcome we would expect (just like a global maximum of
an function is exactly one value). The problem is that the algorithm
itself doesn't guarantee to find that solution or comes as near to the
solution in its allotted time, allotted number of generations or just
gets stuck in some local extreme value, however this should usually
happen with a small statistical probability in a normal run of the
algorithm that has a fast enough CPU.

Maybe I'm wrong but I don't see this failing in tinderboxes or
jenkins, so I wonder what ubsan does to make it fail. The algorithm
has a time limit, could it be that the execution is slowed down so
much that the result didn't develop enough (I didn't expect this to be
so). Could we skip it for ubsan only?

Regards, Tomaž