etnaviv-gpu 134000.gpu: MMU fault status 0x00000002 on i.XM6 Quad Plus

Russell King - ARM Linux linux at armlinux.org.uk
Thu Aug 31 11:18:57 UTC 2017


I've just stumbled on a bug related to the way we handle fence
timeouts.

For DRM_ETNAVIV_WAIT_FENCE, we have:

struct drm_etnaviv_wait_fence {
        __u32 pipe;           /* in */
        __u32 fence;          /* in */
        __u32 flags;          /* in, mask of ETNA_WAIT_x */
        __u32 pad;
        struct drm_etnaviv_timespec timeout;   /* in */
};

where timeout is:

struct drm_etnaviv_timespec {
        __s64 tv_sec;          /* seconds */
        __s64 tv_nsec;         /* nanoseconds */
};

The timeout is with respect to the monotonic clock.  If the timeout is
specified far enough in the future, eg:

9088652.2192296410 now 4793684.242296410

then rather than waiting, the function returns almost immediately with
ETIMEDOUT.  The requested timeout is equivalent to (uint32_t)~0
milliseconds.

In the kernel, we take the drm_etnaviv_timespec, and stick it into a
struct timespec via the TS() macro.  This gets passed to
etnaviv_gpu_wait_fence_interruptible(), which uses
etnaviv_timeout_to_jiffies() to convert to jiffies.  I suspect that
the conversion to jiffies in timespec_to_jiffies() results in a
jiffy value that time_after() believes to be before the current time,
resulting in ultimately a zero jiffy timeout.

Merely stracing the X server, or adding a fprintf() is enough to
avoid the problem.

If you hit this problem, you'll see "fence finish failed" in the Xorg
log.

I think doing the time_after() dance after converting to jiffies is
wrong: if we're going to have an API that accepts absolute time, then
we should handle times that are beyond the ability for us to schedule
the wait correctly.

It looks like other APIs that take a timespec or timeval (eg, ppoll(),
select(), pselect()) convert the timespec to a ktime value, which
limits to KTIME_MAX (see time*_to_ktime() and ktime_set()), which is
a much nicer behaviour than that which the etnaviv DRM driver is
currently giving us.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


More information about the etnaviv mailing list