etnaviv-gpu 134000.gpu: MMU fault status 0x00000002 on i.XM6 Quad Plus
Russell King - ARM Linux
linux at armlinux.org.uk
Thu Aug 31 12:54:55 UTC 2017
On Thu, Aug 31, 2017 at 02:49:27PM +0200, Lucas Stach wrote:
> Hi Russell,
>
> Am Donnerstag, den 31.08.2017, 12:18 +0100 schrieb Russell King - ARM
> Linux:
> > I've just stumbled on a bug related to the way we handle fence
> > timeouts.
> >
> > For DRM_ETNAVIV_WAIT_FENCE, we have:
> >
> > struct drm_etnaviv_wait_fence {
> > __u32 pipe; /* in */
> > __u32 fence; /* in */
> > __u32 flags; /* in, mask of ETNA_WAIT_x */
> > __u32 pad;
> > struct drm_etnaviv_timespec timeout; /* in */
> > };
> >
> > where timeout is:
> >
> > struct drm_etnaviv_timespec {
> > __s64 tv_sec; /* seconds */
> > __s64 tv_nsec; /* nanoseconds */
> > };
> >
> > The timeout is with respect to the monotonic clock. If the timeout is
> > specified far enough in the future, eg:
> >
> > 9088652.2192296410 now 4793684.242296410
> >
> > then rather than waiting, the function returns almost immediately with
> > ETIMEDOUT. The requested timeout is equivalent to (uint32_t)~0
> > milliseconds.
> >
> > In the kernel, we take the drm_etnaviv_timespec, and stick it into a
> > struct timespec via the TS() macro. This gets passed to
> > etnaviv_gpu_wait_fence_interruptible(), which uses
> > etnaviv_timeout_to_jiffies() to convert to jiffies. I suspect that
> > the conversion to jiffies in timespec_to_jiffies() results in a
> > jiffy value that time_after() believes to be before the current time,
> > resulting in ultimately a zero jiffy timeout.
> >
> > Merely stracing the X server, or adding a fprintf() is enough to
> > avoid the problem.
> >
> > If you hit this problem, you'll see "fence finish failed" in the Xorg
> > log.
> >
> > I think doing the time_after() dance after converting to jiffies is
> > wrong: if we're going to have an API that accepts absolute time, then
> > we should handle times that are beyond the ability for us to schedule
> > the wait correctly.
> >
> > It looks like other APIs that take a timespec or timeval (eg, ppoll(),
> > select(), pselect()) convert the timespec to a ktime value, which
> > limits to KTIME_MAX (see time*_to_ktime() and ktime_set()), which is
> > a much nicer behaviour than that which the etnaviv DRM driver is
> > currently giving us.
> >
> Are you going to provide a patch for this, or should I take a look at
> fixing this?
I'm still trying to work out exactly what's going on here - if I add
a fprintf() into the DDX or strace it, the -ETIMEDOUT error goes away.
If I halve the delay, or take one second off the delay (that being
~0 msec) the problem still happens. That doesn't tie up with the
conclusions I came to.
So no, at the moment I'm not intending to do anything about it until
I have a robust diagnosis.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
More information about the etnaviv
mailing list