etnaviv-gpu 134000.gpu: MMU fault status 0x00000002 on i.XM6 Quad Plus

Lucas Stach l.stach at pengutronix.de
Thu Aug 31 12:49:27 UTC 2017


Hi Russell,

Am Donnerstag, den 31.08.2017, 12:18 +0100 schrieb Russell King - ARM
Linux:
> I've just stumbled on a bug related to the way we handle fence
> timeouts.
> 
> For DRM_ETNAVIV_WAIT_FENCE, we have:
> 
> struct drm_etnaviv_wait_fence {
>         __u32 pipe;           /* in */
>         __u32 fence;          /* in */
>         __u32 flags;          /* in, mask of ETNA_WAIT_x */
>         __u32 pad;
>         struct drm_etnaviv_timespec timeout;   /* in */
> };
> 
> where timeout is:
> 
> struct drm_etnaviv_timespec {
>         __s64 tv_sec;          /* seconds */
>         __s64 tv_nsec;         /* nanoseconds */
> };
> 
> The timeout is with respect to the monotonic clock.  If the timeout is
> specified far enough in the future, eg:
> 
> 9088652.2192296410 now 4793684.242296410
> 
> then rather than waiting, the function returns almost immediately with
> ETIMEDOUT.  The requested timeout is equivalent to (uint32_t)~0
> milliseconds.
> 
> In the kernel, we take the drm_etnaviv_timespec, and stick it into a
> struct timespec via the TS() macro.  This gets passed to
> etnaviv_gpu_wait_fence_interruptible(), which uses
> etnaviv_timeout_to_jiffies() to convert to jiffies.  I suspect that
> the conversion to jiffies in timespec_to_jiffies() results in a
> jiffy value that time_after() believes to be before the current time,
> resulting in ultimately a zero jiffy timeout.
> 
> Merely stracing the X server, or adding a fprintf() is enough to
> avoid the problem.
> 
> If you hit this problem, you'll see "fence finish failed" in the Xorg
> log.
> 
> I think doing the time_after() dance after converting to jiffies is
> wrong: if we're going to have an API that accepts absolute time, then
> we should handle times that are beyond the ability for us to schedule
> the wait correctly.
> 
> It looks like other APIs that take a timespec or timeval (eg, ppoll(),
> select(), pselect()) convert the timespec to a ktime value, which
> limits to KTIME_MAX (see time*_to_ktime() and ktime_set()), which is
> a much nicer behaviour than that which the etnaviv DRM driver is
> currently giving us.
> 
Are you going to provide a patch for this, or should I take a look at
fixing this?

Regards,
Lucas



More information about the etnaviv mailing list