[Bug 91278] Tonga GPU lock/reset fail with Unigine Valley

Sun Oct 11 09:14:11 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=91278

Grazvydas Ignotas <notasas at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |notasas at gmail.com

--- Comment #34 from Grazvydas Ignotas <notasas at gmail.com> ---
Created attachment 118824
  --> https://bugs.freedesktop.org/attachment.cgi?id=118824&action=edit
test kernel patch

(In reply to Michel Dänzer from comment #29)
> That is interesting, though; the radeonsi driver seems to think there should
> be something mapped at the faulting address. This indicates that either the
> kernel driver fails to handle the mapping properly, or maybe there's a
> problem with communicating the buffer mapping information from userspace to
> the kernel driver.

Judging by the symptoms it feels like some caching/buffering problem somewhere. 

If I understand the code right, most of things are mapped write-combine, which
means the CPU is allowed to write data it any order it likes. Looking at
amdgpu/radeon code, there is surprising lack of barriers, basically it's just
amdgpu_ring_commit()/radeon_ring_commit() and that's it. But mb() doesn't
guarantee that the writes will arrive in program order, it just ensures that
all the writes are finished after that mb() statement.

So the question is, is it ok for the hardware if in something like
amdgpu_ib_schedule() the writes to the ring arrive before the writes to IB? I
do admit I don't understand how the hardware works, like what triggers the
hardware to start processing the ring contents, perhaps the write to the last
word in the ring? If so you clearly need a wmb() before the write which
triggers the hardware so that everything is ready before the GPU kicks in.

Attached is a debug kernel patch to test if my guess is correct. It's way
overkill and will trash performance, but it should show if this is a problem
related to CPU caching/buffering. I don't have the hardware to test this
myself.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20151011/cf45025c/attachment-0001.html>