[PATCH] Revert "drm/radeon: Try evicting from CPU accessible to inaccessible VRAM first"

Thu Mar 23 15:31:01 UTC 2017

>
> Was userspace maybe performing concurrent CPU access to the BOs in
> question?

As far as I know Julien has demonstrated that this is not the case.

> I hope we can find a better solution.

Understood -- I thought you might not want to take this patch, but I went
ahead and sent it out because Christian requested it, and it seems like he
doesn't think VRAM bos should ever evict back to VRAM at all?

Is my understanding of the original commit correct in that it tries to
rewrite the eviction placements of CPU accessible bos so that they are
either size zero (fpfn and lpfn = start of inaccessible VRAM) or they are
in inaccessible VRAM (fpfn = start of inaccessible VRAM and lpfn = 0)?

In this case, to me it seems that the simplest fix would be to iterate
using i to rewrite all the VRAM placements instead of just the first one
(rbo->placements[i] instead of rbo->placements[0]). In the case where
RADEON_GEM_NO_CPU_ACCESS
is set, the second placement will be in CPU accessible VRAM, and that
doesn't seem correct to me as there is no longer any sort of ordering for
evictions. (Unfortunately I'm not currently in a position to test whether
this fixes our issue.) Sorry, I meant to make a note of this originally.

Also, I don't claim to understand this code well enough, but I wonder: if
these sorts of evictions are desirable, would it make more sense to treat
CPU inaccessible/accessible VRAM as distinct entities with their own lrus?

I should also note that we are experiencing another issue where the kernel
locks up in similar circumstances. As Julien noted, we get no output, and
the watchdogs don't seem to work. It may be the case that Xorg and our
process are calling ttm_bo_mem_force_space concurrently, but I don't think
we have enough information yet to say for sure. Reverting this commit does
not fix that issue. I have some small amount of evidence indicating that
bos flagged for CPU access are getting placed in CPU inaccessible memory.
Could that cause this sort of kernel lockup?

Thanks for your help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170323/7bdc065d/attachment.html>