Graceful page fault handling for Vega/Navi

Huang, Ray Ray.Huang at amd.com
Wed Sep 4 23:03:28 UTC 2019


On Wed, Sep 04, 2019 at 05:02:21PM +0200, Christian König wrote:
> Hi everyone,
> 
> this series is the next puzzle piece for recoverable page fault handling on Vega and Navi.
> 
> It adds a new direct scheduler entity for VM updates which is then used to update page tables during a fault.
> 
> In other words previously an application doing an invalid memory access would just hang and/or repeat the invalid access over and over again. Now the handling is modified so that the invalid memory access is redirected to the dummy page.
> 
> This needs the following prerequisites:
> a) The firmware must be new enough so allow re-routing of page faults.
> b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.

In my side, I found "notretry" parameter not workable for vmid 0 vm faults.
If the same observation in your side, I'd like give a check.

Thanks,
Ray


> c) Enough free VRAM to allocate page tables to point to the dummy page.
> 
> The re-routing of page faults current only works on Vega10, so Vega20 and Navi will still need some more time.
> 
> Please review and/or comment,
> Christian.
> 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list