[PATCH 27/27] drm/amdgpu: Fix GTT size calculation
Felix Kuehling
felix.kuehling at gmail.com
Sat Jul 13 20:24:57 UTC 2019
Am 2019-04-30 um 1:03 p.m. schrieb Koenig, Christian:
>>> The only real solution I can see is to be able to reliable kill shaders
>>> in an OOM situation.
>> Well, we can in fact preempt our compute shaders with low latency.
>> Killing a KFD process will do exactly that.
> I've taken a look at that thing as well and to be honest it is not even
> remotely sufficient.
>
> We need something which stops the hardware *immediately* from accessing
> system memory, and not wait for the SQ to kill all waves, flush caches
> etc...
>
> One possibility I'm playing around with for a while is to replace the
> root PD for the VMIDs in question on the fly. E.g. we just let it point
> to some dummy which redirects everything into nirvana.
>
> But implementing this is easier said than done...
Warming up this thread, since I just fixed another bug that was enabled
by artificial memory pressure due to the GTT limit.
I think disabling the PD for the VMIDs is a good idea. A problem is that
HWS firmware updates PD pointers in the background for its VMIDs. So
this would require a reliable and fast way to kill the HWS first.
An alternative I thought about is, disabling bus access at the BIF level
if that's possible somehow. Basically we would instantaneously kill all
GPU system memory access, signal all fences or just remove all fences
from all BO reservations (reservation_object_add_excl_fence(resv, NULL))
to allow memory to be freed, let the OOM killer do its thing, and when
the dust settles, reset the GPU.
Regards,
Felix
>
> Regards,
> Christian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190713/8215f992/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190713/8215f992/attachment.sig>
More information about the amd-gfx
mailing list