[PATCH 02/10] drm/etnaviv: mmuv2: don't map zero page
Guido Günther
agx at sigxcpu.org
Mon Jan 7 09:13:24 UTC 2019
Hi,
On Mon, Jan 07, 2019 at 09:50:52AM +0100, Lucas Stach wrote:
> Hi Guido,
>
> Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther:
> > Hi Lucas,
> > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote:
> > > Keep the page at address 0 as faulting to catch any potential state
> > > setup issues early.
> >
> > This is a nice idea! But applying this and making mesa hit that page
> > leads to the process hanging in D state over here on GC7000:
> >
> > # [ 242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds.
> > [ 242.733010] Not tainted 4.18.0-00129-gce2b21074b41 #504
> > [ 242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 242.746638] kworker/u8:2 D 0 37 2 0x00000028
> > [ 242.752144] Workqueue: events_unbound commit_work
> > [ 242.756860] Call trace:
> > [ 242.759318] __switch_to+0x94/0xd0
> > [ 242.762741] __schedule+0x1c0/0x6b8
> > [ 242.766239] schedule+0x40/0xa8
> > [ 242.769380] schedule_timeout+0x2f0/0x428
> > [ 242.773410] dma_fence_default_wait+0x1cc/0x2b8
> > [ 242.777951] dma_fence_wait_timeout+0x44/0x1b0
> > [ 242.782403] drm_atomic_helper_wait_for_fences+0x48/0x108
> > [ 242.787819] commit_tail+0x30/0x80
> > [ 242.791229] commit_work+0x20/0x30
> > [ 242.794642] process_one_work+0x1ec/0x458
> > [ 242.798659] worker_thread+0x48/0x430
> > [ 242.802331] kthread+0x130/0x138
> > [ 242.805557] ret_from_fork+0x10/0x1c
> >
> > This is in dmesg showing that we hit the first page:
> >
> > [ 65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002
> > [ 65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40
> >
> > Without that patch it's sampling random data from that page but does not hang.
>
> GPU hangs after a MMU fault are expected or more accurately, we
> actively request the GPU to stop by setting the exception bit in the
> page table.
Yeah. I put that in to show that this the cause for the trouble above.
>
> A hanging GPU should trigger the scheduler timeout handler, which then
> makes sure to get the GPU back into a working state. So if things don't
> progress after the fault for you either the timeout handler is buggy on
> GC7000, or the fence signaling is broken somehow. I'll take a look at
> this.
This isn't a top notch linux-next based tree yet so if you're not seeing this
let me forward port our stuff to that and report back again.
Cheers,
-- Guido
More information about the etnaviv
mailing list