Acceptable outcomes of SwarmSolverTest::testUnconstrained
sbergman at redhat.com
Wed Feb 28 08:41:44 UTC 2018
On 28.02.2018 02:36, Tomaž Vajngerl wrote:
> On Tue, Feb 27, 2018 at 4:53 PM, Stephan Bergmann <sbergman at redhat.com> wrote:
>> SwarmSolverTest::testUnconstrained in sccomp/qa/unit/SwarmSolverTest.cxx has
>> already been weakened in the past,
>> "Ridiculously large delta for SwarmSolverTest::testUnconstrained for now"
>> "Weaken SwarmSolverTest::testUnconstrained even further for now". The first
>> one has the following in its commit message: "suggestion by Tomaž Vajngerl
>> was: 'Let's adapt the delta for now. Generally anything close to 3 should be
>> acceptable as the algorithm greatly depends on random values.'"
>> Now <https://ci.libreoffice.org/job/lo_ubsan/833/console> failed with
>>> double equality assertion failed
>>> - Expected: 3
>>> - Actual : 94.6605927051114
>>> - Delta : 0.9
>> Is that also an acceptable outcome, or does it indicate a bug somewhere that
>> would need to be fixed? What good is a test whose success criterion is the
>> result of ad-hoc guesswork, instead of being determined precisely up-front
>> when the test was written?
>> Can that test please be fixed properly, so that it would be actually useful?
> Well, it is neither - that's just the nature of stochastic algorithms.
> It is not the fault of the test - how it was defined at the beginning
> was the exact outcome we would expect (just like a global maximum of
> an function is exactly one value). The problem is that the algorithm
> itself doesn't guarantee to find that solution or comes as near to the
> solution in its allotted time, allotted number of generations or just
> gets stuck in some local extreme value, however this should usually
> happen with a small statistical probability in a normal run of the
> algorithm that has a fast enough CPU.
Then those qualities of the algorithm need to be taken into account when
writing the test, I think. A small probability of failure is apparently
still a problem. We need tests to be reliable.
> Maybe I'm wrong but I don't see this failing in tinderboxes or
> jenkins, so I wonder what ubsan does to make it fail. The algorithm
> has a time limit, could it be that the execution is slowed down so
> much that the result didn't develop enough (I didn't expect this to be
> so). Could we skip it for ubsan only?
Those ASan+UBSan tinderbox builds execute rather slowly, yes.
(<http://clang.llvm.org/docs/AddressSanitizer.html> claims "Typical
slowdown introduced by AddressSanitizer is 2x.")
But also as reported by others today on #libreoffice-dev:
> Feb 28 09:17:32 <buovjaga> sberg: I got a swamsolver failure yesterday. Then I pulled later and the next build went fine.
> Feb 28 09:19:03 <buovjaga> After the failure, soffice refused to start. I don't have logs, unfortunately
More information about the LibreOffice