[PATCH 02/10] drm/etnaviv: mmuv2: don't map zero page

Lucas Stach l.stach at pengutronix.de
Mon Jan 7 15:02:33 UTC 2019


Am Montag, den 07.01.2019, 10:13 +0100 schrieb Guido Günther:
> Hi,
> On Mon, Jan 07, 2019 at 09:50:52AM +0100, Lucas Stach wrote:
> > Hi Guido,
> > 
> > Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther:
> > > Hi Lucas,
> > > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote:
> > > > Keep the page at address 0 as faulting to catch any potential state
> > > > setup issues early.
> > > 
> > > This is a nice idea! But applying this and making mesa hit that page
> > > leads to the process hanging in D state over here on GC7000:
> > > 
> > > # [  242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds.
> > > [  242.733010]       Not tainted 4.18.0-00129-gce2b21074b41 #504
> > > [  242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  242.746638] kworker/u8:2    D    0    37      2 0x00000028
> > > [  242.752144] Workqueue: events_unbound commit_work
> > > [  242.756860] Call trace:
> > > [  242.759318]  __switch_to+0x94/0xd0
> > > [  242.762741]  __schedule+0x1c0/0x6b8
> > > [  242.766239]  schedule+0x40/0xa8
> > > [  242.769380]  schedule_timeout+0x2f0/0x428
> > > [  242.773410]  dma_fence_default_wait+0x1cc/0x2b8
> > > [  242.777951]  dma_fence_wait_timeout+0x44/0x1b0
> > > [  242.782403]  drm_atomic_helper_wait_for_fences+0x48/0x108
> > > [  242.787819]  commit_tail+0x30/0x80
> > > [  242.791229]  commit_work+0x20/0x30
> > > [  242.794642]  process_one_work+0x1ec/0x458
> > > [  242.798659]  worker_thread+0x48/0x430
> > > [  242.802331]  kthread+0x130/0x138
> > > [  242.805557]  ret_from_fork+0x10/0x1c
> > > 
> > > This is in dmesg showing that we hit the first page:
> > > 
> > >     [   65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002
> > >     [   65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40
> > > 
> > > Without that patch it's sampling random data from that page but does not hang.
> > 
> > GPU hangs after a MMU fault are expected or more accurately, we
> > actively request the GPU to stop by setting the exception bit in the
> > page table.
> 
> Yeah. I put that in to show that this the cause for the trouble above.
> 
> > 
> > A hanging GPU should trigger the scheduler timeout handler, which then
> > makes sure to get the GPU back into a working state. So if things don't
> > progress after the fault for you either the timeout handler is buggy on
> > GC7000, or the fence signaling is broken somehow. I'll take a look at
> > this.
> 
> This isn't a top notch linux-next based tree yet so if you're not seeing this
> let me forward port our stuff to that and report back again.

I've certainly seen the timeout handler working on GC7000, but with the
GC7000 support being relatively lightly tested right now, I wouldn't
bet on us handling all corner cases correctly.

If this is an issue on a recent kernel, I would certainly love to learn
what's going wrong.

Regards,
Lucas


More information about the etnaviv mailing list