[Bug 105702] [CI] igt at kms_flip@* - fail - Failed assertion: (drmWaitVBlank(drm_fd, &wait)) == 0

Wed Jun 26 13:25:05 UTC 2019

https://bugs.freedesktop.org/show_bug.cgi?id=105702

Arek Hiler <arkadiusz.hiler at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arkadiusz.hiler at intel.com

--- Comment #11 from Arek Hiler <arkadiusz.hiler at intel.com> ---
This is an old one that we haven't seen for 3.5 months. Prior to that the
reproduction rate varied quite a bit. Shortly before disappearance it was seen
about once every two weeks. This means we can close the bug after 20 weeks ~= 5
months, so around mid August 2019.

As of what happens, code explains it pretty well (from
kms_flip.c/calibrate_ts()):

        memset(&wait, 0, sizeof(wait));
        wait.request.type = kmstest_get_vbl_flag(crtc_idx);
        wait.request.type |= DRM_VBLANK_RELATIVE | DRM_VBLANK_NEXTONMISS;
        do_or_die(drmWaitVBlank(drm_fd, &wait));

        last_seq = wait.reply.sequence;
        last_timestamp = wait.reply.tval_sec;
        last_timestamp *= 1000000;
        last_timestamp += wait.reply.tval_usec;

        memset(&wait, 0, sizeof(wait));
        wait.request.type = kmstest_get_vbl_flag(crtc_idx);
        wait.request.type |= DRM_VBLANK_ABSOLUTE | DRM_VBLANK_EVENT;
        wait.request.sequence = last_seq;
        for (n = 0; n < CALIBRATE_TS_STEPS; n++) {
                drmVBlank check = {};

                ++wait.request.sequence;
                do_or_die(drmWaitVBlank(drm_fd, &wait));

                /* Double check that haven't already missed the vblank */
                check.request.type = kmstest_get_vbl_flag(crtc_idx);
                check.request.type |= DRM_VBLANK_RELATIVE;
                do_or_die(drmWaitVBlank(drm_fd, &check));

                igt_assert(!igt_vblank_after(check.reply.sequence,
wait.request.sequence));
        }

So we are waiting for a begining of next vblank to get its seq number
(NEXTONMISS), then for the next CALIBRATE_TS_STEPS we are waiting for the very
next vblank (DRM_VBLANK_ABSOLUTE, ++wait.request.sequence) and double checking.

drmWaitVBlank(drm_fd, &wait) seems to be failing with -EBUSY:
    do {
       ret = ioctl(fd, DRM_IOCTL_WAIT_VBLANK, vbl);
       vbl->request.type &= ~DRM_VBLANK_RELATIVE;
       if (ret && errno == EINTR) {
           clock_gettime(CLOCK_MONOTONIC, &cur);
           /* Timeout after 1s */
           if (cur.tv_sec > timeout.tv_sec + 1 ||
               (cur.tv_sec == timeout.tv_sec && cur.tv_nsec >=
                timeout.tv_nsec)) {
                   errno = EBUSY;
                   ret = -1;
                   break;
           }
       }
    } while (ret && errno == EINTR);

out:
    return ret;

Seems like we are missing the vblank with given seq and then bailing out after
1s with EBUSY, which sounds rather serious and may point to a bug in the
kernel, as this amount of code should not take to long even with unfavorable
scheduling. Let's see whether it reappears, but hopefully it was "accidentally
fixed."

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190626/866641f8/attachment-0001.html>